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Cross Reference To Related Applications 
This application claims the benefit of prior U.S. provisional application 60/102,239, filed 
September 29, 1998, and prior U.S. provisional application 60/130, 241, filed April 20, 1999, the 
contents of which are herein incorporated by reference. 

Field of the Invention 

The invention is directed to methods for optimizing the properties of mRNA molecules, 
optimized mRNA molecules, methods of using optimized mRNA molecules, and compositions 
which include optimized mRNA molecules. 

Background of the Invention 

In Eukaroytes, gene expression is affected, in part, by the stability and structure of the 
messenger RNA (mRNA) molecule. mRNA stability influences gene expression by affecting the 
steady-state level of the mRNA; it can affect the rates at which the mRNA disappears following 
transcriptional repression and accumulates following transcriptional induction. The structure and 
nucleotide sequence of the mRNA molecule can also influence the efficiency with which these 
individual mRNA molecules are translated. 

The intrinsic stability of a given mRNA molecule is influenced by a number of specific 
internal sequence elements which can exert a destabilizing effect on the mRNA. These elements 
may be located in any region of the transcript, and e.g., can be found in the 5' untranslated region 
(5'UTR), in the coding region and in the 3' untranslated region (3'UTR). It is well established 
that shortening of the poly(A) tail initiates mRNA decay (Ross, Trends in Genetics, 12:171-175, 
1996). The poly (A) tract influences cytoplasmic mRNA stability by protecting mRNA from 
rapid degradation. Adenosine and uridine rich elements (AUREs) in the 3'UTR are also 
associated with unstable mammalian mRNA's. It has been demonstrated that proteins that bind 
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to AURE, AURE-binding proteins (AUBPs), can affect mRNA stability. The coding region can 
also alter the half-life of many RNAs. For example, the coding region can interact with proteins 
that protect it from endonucleolytic attack. Futheimore, the efficiency with which individual 
mRNA molecules are translated has a strong influence on the stability of the mRNA molecule 
(Herrick et al., Mol Cell Biol. 10, 2269-2284, 1990, and Hoekema et al., Mol Cell Biol. 7, 2914- 
2924, 1987).. 

The single-stranded nature of mRNA allows it to adopt secondary and tertiary structure in 
a sequence-dependent manner through complementary base-pairing. Examples of such 
structures include RNA hairpins, stem loops and more complex structures such as bifurcations, 
pseudoknots and triple-helices. These structures influence both mRNA stability, e.g., the stem 
loop elements in the 3' UTR can serve as a endonuclease cleavage site, and affect translational 
efficiency. 

In addition to the structure of the mRNA, the nucleotide content of the mRNA can also 
play a role in the efficiency with which the mRNA is translated. For example, mRNA with a 
high GC content at the 5 'untranslated region (UTR) may be translated with low efficiency and a 
reduced translational effect can reduce message stability. Thus, altering the sequence of a 
mRNA molecule can ultimately influence mRNA transcript stability, by influencing the 
translational stability of the message. 

Factor VIII and Factor IX are important plasma proteins that participate in the intrinsic 
pathway of blood coagulation. Their dysfunction or absence in individuals can result in blood 
coagulation disorders, e.g., a deficiency of Factor VIII or Factor IX results in Hemophilia A or 
B, respectively. Isolating Factor VIII or Factor IX from blood is difficult, e.g., the isolation of 
Factor VIII is characterized by low yields, and also has the associated danger of being 
contaminated with infectious agents such as Hepatitis B virus, Hepatitis C virus or HIV. 
Recombinant DNA technology provides an alternative method for producing biologically active 
Factor VIII or Factor IX. While these methods have had some success, improving the yield of 
Factor VIII or Factor IX is still a challenge. 

An approach to increasing protein yield using recombinant DNA technology is to modify 
the coding sequence of a protein of interest, e.g., Factor VIII or Factor IX, without altering the 
amino acid sequence of the gene product. This approach involves altering, for example, the 
native Factor VIII or Factor IX gene sequence such that codons which are not so frequently used 
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in mammalian cells are replaced with codons which are overrepresented in highly expressed 
mammalian genes. Seed et al., (WO 98/12207) used this approach with a measure of success. 
They found that substituting the rare mammalian codons with those frequently used in 
mammalian cells results in a four fold increase in Factor VIII production from mammalian cells. 

Summary of the Invention 

In one aspect, the invention features, a synthetic nucleic acid sequence which encodes a 
protein, or a portion thereof, wherein at least one non-common codon or less-common codon has 
been replaced by a common codon, and wherein the synthetic nucleic acid sequence includes a 
continuous stretch of at least 90 codons all of which are common codons. 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment the continuous stretch of common codons can include: the sequence of a 
pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of at least 90, 95, 100, 125, 150, 200, 250, 300 or more codons all of which are common 
codons. 

In another preferred embodiment, the nucleic acid sequence encoding a protein has at 
least 30, 50, 60, 75, 100, 200 or more non-common or less-common codons replaced with a 
common codon. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 
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In a preferred embodiment, all of the non-common or less-common codons of the 
synthetic nucleic acid sequence encoding a protein have been replaced with common codons. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 1 10, 120, 130, 150, 200, 500, 700, 1000 or more amino acids in 
length. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, , or all, of 
the codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the 
codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a synthetic nucleic acid sequence which encodes 
a protein, or a portion thereof, wherein at least one non-common codon or less-common codon 
has been replaced by a common codon, and wherein the synthetic nucleic acid sequence includes 
a continuous stretch of common codons, which continuous stretch includes at least 33% or more 
of the codons in the synthetic nucleic acid sequence. 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment the continuous stretch of common codons can include: the sequence of 
a pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch includes at least 35%, 40%, 50%, 
60%, 70%, 80%o, 90%o, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 
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In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all of the non-common or less-common codons of the 
synthetic nucleic acid sequence encoding a protein have been replaced with common codons. 

In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more amino acids in 
length. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all, of 
the codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the 
codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a synthetic nucleic acid sequence which encodes 
a protein, or a portion thereof, wherein at least one non-common codon or less-common codon 
has been replaced by a common codon, and wherein the number of non-common and less- 
common codons, taken together, is less than n/x, wherein n/x is a positive integer, n is the 
number of codons in the synthetic nucleic acid sequence and x is chosen from 2, 4, 6, 10, 15, 20, 
50, 150, 250, 500 and 1000. (Fractional values for n/x are rounded to the next highest of lowest 
integer, positive values below 0.5 are rounded down and values above 0.5 are rounded up). 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment the continuous stretch of common codons can include: the sequence of a 
pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
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sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment, the number of codons in the synthetic nucleic acid sequence 
(n) is at least 50, 60, 70, 80, 90, 100, 120, 150, 200, 350, 400, 500 or more. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the codons 
in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a synthetic nucleic acid sequence which encodes 
a protein, or a portion thereof, wherein at least one non-common codon or less-common codon 
has been replaced by a common codon in the sequence that has not been optimized (non- 
optimized) which encodes the protein, wherein at least 94% or more of the codons in the 
sequence encoding the protein are common codons and wherein the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100 or 120 amino acids in length. 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment the continuous stretch of common codons can include: the sequence of a 
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pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or more of 
non-common or less-common codons in the non-optimized nucleic acid sequence encoding the 
protein have been replaced by a common codon encoding the same amino acid. Preferably, all 
non-common or less-common codon are replaced by a common codon encoding the same amino 
acid as found in the non-optimized sequence. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more amino acids in 
length. 

In other preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, 99.5% 
of the non-common codons in the non-optimized nucleic acid sequence are replaced with 
common codons. Preferably, all of the non-common codons are replaced with the common 
codons. 

In other preferred embodiments at least 94%, 95%, 96%, 97%, 98%, 98%, 99%, 99.5% of 
the less-common codons in the non-optimized nucleic acid sequence are replaced with common 
codons. Preferably, all of the less-common codons are replaced with the common codons. 

In preferred embodiments, at least 94% or more of the non-common and less common 
codons are replaced with common codons. 

In preferred embodiments, the number of codons replaced which are not common codons 
is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1. 

In preferred embodiments, the number of codons remaining which are not common 
codons is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment the continuous stretch of common codons can include: the sequence of a 
pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
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sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment the synthetic nucleic acid sequence is at least 100, 1 10, 120, 
150, 200, 300, 500, 700, 1000 or more base pairs in length. 

In another aspect, a synthetic nucleic acid sequence that directs the synthesis of an 
optimized message which encodes a Factor VIII protein having one or more of the following 
characteristics: 

a) the B domain is deleted (BDD Factor VIII); 

b) the synthetic nucleic acid sequence has a recognition site for an intracellular 
protease of the PACE/furin class, e.g., X-Arg-X-X-Arg (Molloy et al., J. Biol. Chem. 
267:1639616401, 1992); a short-peptide linker, e.g., a two peptide linker, e.g., a leucine-glutamic 
acid peptide linker (LE), a three, or a four peptide linker, inserted at the heavy-light chain 
junction. 

c) the synthetic nucleic acid sequence is introduced into a cell, e.g., a primary cell, a 
secondary cell a transformed or an immortalized cell line. Examples of an immortalized human 
cell line useful in the present method include, but are not limited to; a Bowes Melanoma cell 
(ATCC Accession No. CRL 9607), a Daudi cell (ATCC Accession No. CCL 213), a HeLa cell 
and a derivative of a HeLa cell (ATCC Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 
cell (ATCC Accession No. CCL 240), a HT1080 cell (ATCC Accession No. CCL 121), a Jurkat 
cell (ATCC Accession No. TIB 152), a KB carcinoma cell (ATCC Accession No. CCL 17), a K- 
562 leukemia cell (ATCC Accession No. CCL 243), a MCF-7 breast cancer cell (ATCC 
Accession No. BTH 22), a MOLT-4 cell (ATCC Accession No. 1582), a Namalwa cell (ATCC 
Accession No. CRL 1432), a Raji cell (ATCC Accession No. CCL 86), a RPMI 8226 cell 
(ATCC Accession No. CCL 155), a U-937 cell (ATCC Accession No. CRL 1593), WI-38VA13 
sub line 2R4 cells (ATCC Accession No. CLL 75.1), a CCRF-CEM cell (ATCC Accession No. 
CCL 1 19) and a 2780AD ovarian carcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927- 
5932, 1988), as well as heterohybridoma cells produced by fusion of human cells and cells of 
another species. In another embodiment, the immortalized cell line can be cell line other than a 
human cell line, e.g., a CHO cell line. In a preferred embodiment, the cell is a non-transformed 
cell. In various preferred embodiments, the cell is a mammalian cell, e.g., a primary or 
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secondary mammalian cell, e.g., a fibroblast, a hematopoietic stem cell, a myoblast, a 
keratinocyte, an epithelial cell, an endothelial cell, a glial cell, a neural cell, a cell comprising a 
formed element of the blood, a muscle cell and precursors of these somatic cells. In a most 
preferred embodiment, the cell is a secondary human fibroblast. 

In a preferred embodiment, the synthetic nucleic acid sequence which encodes a factor 
VIII protein has at least one, preferably at least two, and most preferably, all of the 
characteristics a, b, and c described above. 

In preferred embodiments, at least one non-common codon or less-common codon of the 
synthetic nucleic acid has been replaced by a common codon and the synthetic nucleic acid has 
one or more of the following properties: it has a continuous stretch of at least 90 codons all of 
which are common codons; it has a continuous stretch of common codons which comprise at 
least 33% of the codons of the synthetic nucleic acid sequence; at least 94% or more of the 
codons in the sequence encoding the protein are common codons and the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100, or 120 amino acids in length; it is at least 80 
base pairs in length and which is free of unique restriction endonuclease sites that would occur in 
the message optimized sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 
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In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. 

Preferably, all of the codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch comprises at least 35%, 40%, 50%, 
60%, 70%, 80%, 90%, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In another aspect, the invention features, a synthetic nucleic acid sequence which can 
direct the synthesis of an optimized message which encodes a Factor IX protein having one or 
more of the following characteristics: 

a) it has a PACE/furin, such as a X-Arg-X-X-Arg site, at a pro-peptide mature 
protein junction; or 

b) is inserted, e.g., via transfection, into a non-transformed cell, e.g., a primary or 
secondary cell, e.g., a primary human fibroblast. 

In a preferred embodiment, the synthetic nucleic acid sequence which encodes a factor IX 
protein has at least one, and preferably, both of the characteristics a and b described above. 

In preferred embodiments, at least one non-common codon or less-common codon of the 
synthetic nucleic acid has been replaced by a common codon and the synthetic nucleic acid has 
one or more of the following properties: it has a continuous stretch of at least 90 codons all of 
which are common codons; it has a continuous stretch of common codons which comprise at 
least 33% of the codons of the synthetic nucleic acid sequence; at least 94% or more of the 
codons in the sequence encoding the protein are common codons and the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100, or 120 amino acids in length; it is at least 80 
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base pairs in length and is free of unique restriction endonuclease sites that occur in the message 
optimized sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 

In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. 

Preferably, all of the codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch comprises at least 35%, 40%, 50%, 
60%, 70%, 80%, 90%, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In another aspect, the invention features, a plasmid or a DNA construct, e.g., an 
expression plasmid or a DNA construct, which includes a synthetic nucleic acid sequence 
described herein. 
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In yet another aspect, the invention features, a synthetic nucleic acid sequence described 
herein introduced into the genome of an animal cell. In a preferred embodiment, the animal cell 
is a primate cell, e.g., a mammal cell, e.g., a human cell. 

In still another aspect, the invention features, a cell harboring a synthetic nucleic acid 
sequence described herein, e.g., a cell from a primary or secondary cell strain, or a cell from a 
continuous cell line, e.g., a Bowes Melanoma cell (ATCC Accession No. CRL 9607), a Daudi 
cell (ATCC Accession No. CCL 213), a HeLa cell and a derivative of a HeLa cell (ATCC 
Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 cell (ATCC Accession No. CCL 240), a 
HT1080 cell (ATCC Accession No. CCL 121), a Jurkat cell (ATCC Accession No. TIB 152), a 
KB carcinoma cell (ATCC Accession No. CCL 17), a K-562 leukemia cell (ATCC Accession 
No. CCL 243), a MCF-7 breast cancer cell (ATCC Accession No. BTH 22), a MOLT-4 cell 
(ATCC Accession No. 1582), a Namalwa cell (ATCC Accession No. CRL 1432), a Raji cell 
(ATCC Accession No. CCL 86), a RPMI 8226 cell (ATCC Accession No. CCL 155), a U-937 
cell (ATCC Accession No. CRL 1593), a WI-38VA13 sub line 2R4 cell (ATCC Accession No. 
CLL 75.1), a CCRF-CEM cell (ATCC Accession No. CCL 1 19) and a 2780AD ovarian 
carcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927-5932, 1988), as well as 
heterohybridoma cells produced by fusion of human cells and cells of another species. In another 
embodiment, the immortalized cell line can be a cell line other than a human cell line, e.g., a 
CHO cell line In a preferred embodiment, the cell is a non-transformed cell. In various preferred 
embodiments, the cell is a mammalian cell, e.g., a primary or secondary mammalian cell, e.g., a 
fibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, an epithelial cell, an endothelial 
cell, a glial cell, a neural cell, a cell comprising a formed element of the blood, a muscle cell and 
precursors of these somatic cells. In a most preferred embodiment, the cell is a secondary 
human fibroblast. 

In another aspect, the invention features, a method for preparing a synthetic nucleic acid 
sequence encoding a protein which is, preferably, at least 90 codons in length, e.g., a synthetic 
nucleic acid sequence described herein. The method includes identifying non-common and less- 
common codons in the non-optimized gene encoding the protein and replacing at least, 94%, 
95%, 96%, 97%, 98%, 99% or more of the non-common and less-common codons with a 
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common codon encoding the same amino acid as the replaced codon. Preferably, all non- 
common and less-common codons are replaced with common codons. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more codons in length. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a method for making a nucleic acid sequence 
which directs the synthesis of a optimized messsage of a protein of at least 90, 100, or 120 amino 
acids in length, e.g., a synthetic nucleic acid sequence described herein. The method includes: 
synthesizing at least two fragments of the nucleic acid sequence, wherein the two fragments 
encode adjoining portions of the protein and wherein both fragments are mRNA optimized, e.g., 
as described herein; and joining the two fragments such that a non-common codon is not created 
at a junction point, thereby making the mRNA optimized nucleic acid sequence. 

In a preferred embodiment, the two fragments are joined together such that a unique 
restriction endonuclease site used to create the two fragments is not recreated at the junction 
point. In another preferred embodiment, the two fragments are joined together such that a 
unique restriction site is created. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more codons in length. 

In a preferred embodiment, at least 3, 4, 5, 6, 7, 8, 9, 10 or more fragments of the nucleic 
acid sequence are synthesized. 

In a preferred embodiment, the fragments are joined together by a fusion, e.g., a blunt end 

fusion. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the codons 
in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the number of codons which are not common codons is equal 
to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1. 
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In preferred embodiments, each fragment is at least 30, 40, 50, 75, 100, 120, 150 or more 
codons in length. 

In another aspect, the invention features, a method of providing a subject, e.g., a human, 
with a protein. The methods includes: providing a synthetic nucleic acid sequence that can 
direct the synthesis of an optimized message for a protein, e.g., a synthetic nucleic acid sequence 
described herein; introducing the synthetic nucleic acid sequence that directs the synthesis of an 
optimized message for a protein into the subject; and allowing the subject to express the protein, 
thereby providing the subject with the protein. 

In preferred embodiments, the method further includes inserting the nucleic acid 
sequence that can direct the synthesis of an optimized message into a cell. The cell can be an 
autologous, allogeneic, or xenogeneic cell, but is preferably autologous. A preferred cell is a 
fibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, an epithelial cell, an 
endothelial cell, a glial cell, a neural cell, a cell comprising a formed element of the blood, a 
muscle cell and precursors of these somatic cells. The mRNA optimized synthetic nucleic acid 
sequence can be inserted into the cell ex vivo or in vivo. If inserted ex vivo, the cell can be 
introduced into the subject. 

In preferred embodiments, at least 94%, 95%, 96%, , 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the codons 
in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the number of codons which are not common codons is equal 
to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1. 

The invention also features synthetic nucleic acid fragments which encode a portion of a 
protein. Such synthetic nucleic acid fragments are similar to the synthetic nucleic acid sequences 
of the invention except that they encode only a portion of a protein. Such nucleic acid fragments 
preferably encode at least 50, 60, 70, 80, 100, 1 10, 120, 130, 150, 200, 300, 400, 500, or more 
contiguous amino acids of the protein. 

The invention also features transfected or infected primary and secondary somatic cells of 
vertebrate origin, particularly of mammalian origin, e.g., of human, mouse, or rabbit origins, e.g., 

14 


primary human cells, secondary human cells, or primary or secondary rabbit cells. The cells are 
transfected or infected with exogenous synthetic nucleic acid, e.g., DNA, described herein. The 
synthetic nucleic acid can encode a protein, e.g., a therapeutic protein, e.g., an enzyme, a 
cytokine, a hormone, an antigen, an antibody, a clotting factor, e.g., Factor VIII, Factor IX, or a 
regulatory protein. The invention also includes methods by which primary and secondary cells 
are transfected or infected to include exogenous synthetic DNA, methods of producing clonal 
cell strains or heterogenous cell strains, and methods of gene therapy in which the transfected or 
infected primary or secondary cells are used. The synthetic nucleic acid directs the synthesis of 
an optimized message, e.g., an optimized message as described herein. 

The present invention includes primary and secondary somatic cells, which have been 
transfected or infected with an exogenous synthetic nucleic acid described herein, which is stably 
integrated into their genomes or is expressed in the cells episomally. In preferred embodiments 
the cells are fibroblasts, keratinocytes, epithelial cells, endothelial cells, glial cells, neural cells, 
cells comprising a formed element of the blood, muscle cells, other somatic cells which can be 
cultured, or somatic cell precursors. The resulting cells are referred to, respectively, as 
transfected or infected primary cells and transfected or infected secondary cells. The exogenous 
synthetic DNA encodes a protein, or a portion thereof, e.g., a therapeutic protein (e.g., Factor 
VIII or Factor IX). In an embodiment in which the exogenous synthetic DNA encodes a protein, 
or a portion thereof, to be expressed by the recipient cells, the resulting protein can be retained 
within the cell, incorporated into the cell membrane or secreted from the cell. In this 
embodiment, the exogenous synthetic DNA encoding the protein is introduced into cells along 
with additional DNA sequences sufficient for expression of the exogenous synthetic DNA in the 
cells. The additional DNA sequences may be of viral or non-viral origin. Primary cells modified 
to express exogenous synthetic DNA are referred to herein as transfected or infected primary 
cells, which include cells removed from tissue and placed on culture medium for the first time. 
Secondary cells modified to express or render available exogenous DNA are referred to herein as 
transfected or infected secondary cells. 

Primary and secondary cells transfected or infected by the subject method, e.g., cloned 
cell strains, can be seen to fall into three types or categories: 1) cells which do not, as obtained, 
make or contain the therapeutic protein, 2) cells which make or contain the therapeutic protein 
but in lower quantities than normal (in quantities less than the physiologically normal lower 
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level) or in defective form, and 3) cells which make the therapeutic protein at physiologically 
normal levels, but are to be augmented or enhanced in their content or production. Examples of 
proteins that can be made by the present method include cytokines or clotting factors. 

Exogenous synthetic DNA is introduced into primary or secondary cell by a variety of 
techniques. For example, a DNA construct which includes exogenous synthetic DNA encoding 
a therapeutic protein and additional DNA sequences necessary for expression in recipient cells 
can be introduced into primary or secondary cells by electroporation, microinjection, or other 
means (e.g., calcium phosphate precipitation, modified calcium phosphate precipitation, 
polybrene precipitation, liposome fusion, receptor-mediated DNA delivery). Alternatively, a 
vector, such as a retroviral or other vector which includes exogenous synthetic DNA can be used 
and cells can be genetically modified as a result of infection with the vector. 

In addition to the exogenous synthetic DNA, transfected or infected primary and 
secondary cells may optionally contain DNA encoding a selectable marker, which is expressed 
and confers upon recipients a selectable phenotype, such as antibiotic resistance, resistance to a 
cytotoxic agent, nutritional prototrophy or expression of a surface protein. Its presence makes it 
possible to identify and select cells containing the exogenous DNA. A variety of selectable 
marker genes can be used, such as neo, gpt, dhfr, ada, pac, hyg, mdr and hisD. 

Transfected or infected cells of the present invention are useful, as populations of 
transfected or infected primary cells or secondary cells, transfected or infected clonal cell strains, 
transfected or infected heterogenous cell strains, and as cell mixtures in which at least one 
representative cell of one of the three preceding categories of transfected or infected cells is 
present, (e.g., the mixture of cells contains essentially transfected or infected primary or 
secondary cells and may include untransfected or uninfected primary or secondary cells) as a 
delivery system for treating an individual with an abnormal or undesirable condition which 
responds to delivery of a therapeutic protein, which is either: 1) a therapeutic protein (e.g., a 
protein which is absent, underproduced relative to the individual's physiologic needs, defective, 
or inefficiently or inappropriately utilized in the individual, e.g., Factor VIII; or 2) a therapeutic 
protein with novel functions, such as enzymatic or transport functions. In the method of the 
present invention of providing a therapeutic protein, transfected or infected primary cells or 
secondary cells, clonal cell strains or heterogenous cell strains, are administered to an individual 
in whom the abnormal or undesirable condition is to be treated or prevented, in sufficient quanti- 
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ty and by an appropriate route, to express the exogenous synthetic DNA at physiologically 
relevant levels. A physiologically relevant level is one which either approximates the level at 
which the product is produced in the body or results in improvement of the abnormal or 
undesirable condition. 

Clonal cell strains of transfected or infected secondary cells (referred to as transfected or 
infected clonal cell strains) expressing exogenous synthetic DNA (and, optionally, including a 
selectable marker gene) can be produced by the method of the present invention. The method 
includes the steps of: 1) providing a population of primary cells, obtained from the individual to 
whom the transfected or infected primary cells will be administered or from another source; 2) 
introducing into the primary cells or into secondary cells derived from primary cells a DNA 
construct which includes exogenous DNA as described above and the necessary additional DNA 
sequences described above, producing transfected or infected primary or secondary cells; 3) 
maintaining transfected or infected primary or secondary cells under conditions appropriate for 
their propagation; 4) identifying a transfected or infected primary or secondary cell; and 5) 
producing a colony from the transfected or infected primary or secondary cell identified in (4) by 
maintaining it under appropriate culture conditions until a desired number of cells is obtained. 
The desired number of clonal cells is a number sufficient to provide a therapeutically effective 
amount of product when administered to an individual, e.g., an individual with hemophilia A is 
provided with a population of cells that produce a therapeutically effective amount of Factor 
VIII, such that that the condition is treated. The number of cells required for a given therapeutic 
dose depends on several factors including the expression level of the protein, the condition of the 
host animal and the limitations associated with the implantation procedure. In general, the 
number of cells required for implantation are in the range of lxlO 6 to 5xl0 9 , and preferably 
1x10 to 5x10 . In one embodiment of the method, the cell identified in (4) undergoes 
approximately 27 doublings ( i.e., undergoes 27 cycles of cell growth and cell division) to 
produce 100 million clonal transfected or infected cells In another embodiment of the method, 
exogenous synthetic DNA is introduced into genomic DNA by homologous recombination 
between DNA sequences present in the DNA construct and genomic DNA. In another 
embodiment, the exogenous synthetic DNA is present episomally in a transfected cell, e.g., 
primary or secondary cell. 
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In one embodiment of producing a clonal population of transfected secondary cells, a cell 
suspension containing primary or secondary cells is combined with exogenous synthetic DNA 
encoding a therapeutic protein and DNA encoding a selectable marker, such as the neo gene. 
The two DNA sequences are present on the same DNA construct or on two separate DNA 
constructs. The resulting combination is subjected to electroporation, generally at 250-300 volts 
with a capacitance of 960 jaFarads and an appropriate time constant (e.g., 14 to 20 m sec) for 
cells to take up the DNA construct. In an alternative embodiment, microinjection is used to 
introduce the DNA construct into primary or secondary cells. In either embodiment, 
introduction of the exogenous DNA results in production of transfected primary or secondary 
cells. The exogenous synthetic DNA introduced into the cell can be stably integrated into 
genomic DNA or is present episomally in the cell. 

In the method of producing heterogenous cell strains of the present invention, the same 
steps are carried out as described for production of a clonal cell strain, except that a single 
transfected primary or secondary cell is not isolated and used as the founder cell. Instead, two or 
more transfected primary or secondary cells are cultured to produce a heterogenous cell strain. A 
heterogenous cell strain can also contain in addition to two or more transfected primary or 
secondary cells, untransfected primary or secondary cells. 

The methods described herein have wide applicability in treating abnormal or undesired 
conditions and can be used to provide a variety of proteins in an effective amount to an 
individual. For example, they can be used to provide secreted proteins (with either 
predominantly systemic or predominantly local effects, e.g., Factor VIII and Factor IX), 
membrane proteins (e.g., for imparting new or enhanced cellular responsiveness, facilitating 
removal of a toxic product or for marking or targeting to a cell) or intracellular proteins (e.g., for 
affecting gene expression or producing autocrine effects). 

A method described herein is particularly advantageous in treating abnormal or undesired 
conditions in that it: 1) is curative (one gene therapy treatment has the potential to last a patient's 
lifetime); 2) allows precise dosing (the patient's cells continuously determine and deliver the 
optimal dose of the required protein based on physiologic demands, and the stably transfected or 
infected cell strains can be characterized extensively in vitro prior to implantation, leading to 
accurate predictions of long term function in vivo) ; 3) is simple to apply in treating patients; 4) 
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eliminates issues concerning patient compliance (following a one-time gene therapy treatment, 
daily protein injections are no longer necessary); and 5) reduces treatment costs (since the thera- 
peutic protein is synthesized by the patient's own cells, investment in costly protein production 
and purification is unnecessary). 

As used herein, the term "optimized messenger RNA" refers to a synthetic nucleic acid 
sequence encoding a protein wherein at least one non-common codon or less-common codon in 
the sequence encoding the protein has been replaced with a common codon. 

By "common codon" is meant the most common codon representing a particular amino 
acid in a human sequence. The codon frequency in highly expressed human genes is outlined 
below in Table 1. Common codons include: Ala (gcc); Arg (cgc); Asn (aac); Asp (gac); Cys 
(tgc); Gin (cag); Gly (ggc); His (cac); He (ate); Leu (ctg); Lys (aag); Pro (ccc); Phe (ttc); Ser 
(age); Thr (acc); Tyr (tac); Glu (gag); and Val (gtg) (see Table 1). "Less-common codons" are 
codons that occurs frequently in humans but are not the common codon: Gly (ggg); He (att); Leu 
(etc); Ser (tec); Val (gtc); and Arg (agg). All codons other than common codons and less- 
common codons are "non-common codons". 
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TABLE 1 : Codon Frequency in Highly Expressed Human Genes 

% occurance % occurence 

Ala Cys 
GC C 53 TG C 68 

T 17 T 32 

A 13 

G 17 Gin 

CA A 12 
Arg G 88 

CG C 37 

T 7 Glu 
A 6 GA A 25 

□ G 21 G 75 

£ AG A 10 

0 


G 18 Gly 


sn GG C 50 

Q 

m Asn T 12 

s n AA C 78 A 14 

T 25 G 24 

ru 

10 

^ Leu His 

CT C 26 CA C 79 

T 5 T 21 

A 3 

G 58 lie 

TT A 2 AT C 77 

G 6 T 18 

A 5 

Lys 

AA A 18 Ser 

G 82 TC C 28 
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Pro 
CC 


C 
T 
A 
G 


48 
19 
16 
17 


Phe 
TT 


C 
T 


80 
20 


AG 


T 
A 
G 
C 
T 


13 

5 

9 

34 
10 


Thr 

AC 


C 
T 
A 
G 


57 
14 
14 
15 


Tyr 

TA C 74 
T 26 


Val 

GT C 25 
T 7 
A 5 
G 64 

Codon frequency in Table 1 was calculated using the GCG program established by the 
University of Wisconsin Genetics Computer Group. Numbers represent the percentage of cases 
in which the particular codon is used. 

The term "primary cell" includes cells present in a suspension of cells isolated from a 
vertebrate tissue source (prior to their being plated i.e., attached to a tissue culture substrate such 
as a dish or flask), cells present in an explant derived from tissue, both of the previous types of 
cells plated for the first time, and cell suspensions derived from these plated cells. The term 
secondary cell or cell strain refers to cells at all subsequent steps in culturing. That is, the first 
time a plated primary cell is removed from the culture substrate and replated (passaged), it is 
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referred to herein as a secondary cell, as are all cells in subsequent passages. Secondary cells are 
cell strains which consist of secondary cells which have been passaged one or more times. A cell 
strain consists of secondary cells that: 1) have been passaged one or more times; 2) exhibit a 
finite number of mean population doublings in culture; 3) exhibit the properties of contact- 
inhibited, anchorage dependent growth (anchorage-dependence does not apply to cells that are 
propagated in suspension culture); and 4) are not immortalized. A "clonal cell strain" is defined 
as a cell strain that is derived from a single founder cell. A "heterogenous cell strain" is defined 
as a cell strain that is derived from two or more founder cells. 

The term "transfected cell" refers to a cell into which an exogenous synthetic nucleic acid 
sequence, e.g., a sequence which encodes a protein, is introduced. Once in the cell, the synthetic 
nucleic acid sequence can integrate into the recipients cells chromosomal DNA or can exist 
episomally. Standard transfection methods can be used to introduce the synthetic nucleic acid 
sequence into a cell, e.g., transfection mediated by liposome, polybrene, DEAE dextran- 
mediated transfection, electroporation, calcium phosphate precipitation or mircoinjection. The 
term "transfection" does not include delivery of DNA or RNA into a cell by a virus 
The term "infected cell" refers to a cell into which an exogenous synthetic nucleic acid sequence, 
e.g., a sequence which encodes a protein, is introduced by a virus. Viruses known to be useful 
for gene transfer include an adenovirus, an adeno-associated virus, a herpes virus, a mumps 
virus, a poliovirus, a retrovirus, a Sindbis virus, a lentivirus and a vaccinia virus such as a canary 
pox virus. Other features and advantages of the invention will be apparent from the following 
detailed description and the claims. 

Detailed Description 
The drawings are first briefly described. 

Figure 1 is a schematic representation of domain structures of full-length and B-domain 
deleted human Factor VIII (hFVIII). 

Figure 2 is a schematic representation of full-length hFVIII. 

Figure 3 is a schematic representation of 5R BDD hFVIII expression plasmid pXF8.186. 
Figure 4 is a schematic representation of LE BDD hFVIII expression plasmid pXF8.61. 
Figure 5 is a schematic representation of the fourteen fragments (Fragments A-Fragment 
N) assembled to construct pXF8.61. 


Figure 6 is a schematic representation of the assembly of pXF8.6L 

Figure 7 depicts the nucleotide sequence and the corresponding amino acid sequence of 
the LE B-domain-deleted-Factor VIII (FVIII)insert contained in pAMl-1 (SEQ ED NO:l). 

Figure 8 is a schematic representation of the fragments assembled to construct pXF8.186. 

Figure 9 depicts the nucleotide sequence and the corresponding amino acid sequence of 
the 5Arg B-domain-deleted-FVIII insert (SEQ ID NO:2). 

Figure 10 is a schematic representation of the Factor VIII expression plasmid, pXF8.36. 
The cytomegalovirus immediate early I (CMV) promoter is depicted as a lightly shaded box. 
Positions of splice donor (SD) and splice acceptor (SA) sites are indicated below the shaded box. 
The Factor VIII cDNA sequence is depicted as a solid dark box. The hGH 3 'UTS region is 
depicted as an open box. The new expression cassette is depicted as a shaded box with an 
arrowhead which corresponds to the direction of transcription. The thin dark line represents the 
plasmid backbone sequences. The position and direction of transcription of the (3-lactamase gene 
{amp) is indicated by the solid boxed arrow. 

Figure 11 is a schematic representation of the Factor VIII expression plasmid, 
pXF8.38. The cytomegalovirus immediate early I (CMV) promoter is depicted as a lightly 
shaded box. Positions of splice donor (SD) and splice acceptor (SA) sites are indicated below 
the shaded box. The Factor VIII cDNA sequence is depicted as a solid dark box. The hGH 
3 'UTS region is depicted as an open box. The neo expression cassette is depicted as a shaded 
box with an arrowhead which corresponds to the direction of transcription. The thin dark line 
represents the plasmid backbone sequences. The position and direction of transcription of the P- 
lactamase gene (amp) is indicated by the solid boxed arrow. 

Figure 12 is a schematic representation of the Factor VIII expression plasmid, 
pXF8.269. The collagen (I) a 2 promoter is depicted as a striped box. The region representing 
aldolase-derived 5' untranslated sequences are depicted as a lightly shaded box. Positions of 
splice donor (SD) and splice acceptor (SA) sites are indicated below the shaded box. The Factor 
VIII cDNA sequence is depicted as a solid dark box. The hGH 3 'UTS region is depicted as an 
open box. The neo expression cassette is depicted as a shaded box with an arrowhead which 
corresponds to the direction of transcription. The thin dark line represents the plasmid backbone 
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sequences. The position and direction of transcription of the P-lactamase gene (amp) is indicated 
by the solid boxed arrow. 

Figure 13 is a schematic representation of the Factor VIII expression plasmid, 
pXF8.224. The collagen (I) a 2 promoter is depicted as a striped box. The region representing 
aldolase-derived 5' untranslated sequences are depicted as a lightly shaded box. Positions of 
splice donor (SD) and splice acceptor (SA) sites are indicated below the shaded box. The Factor 
VIII cDNA sequence is depicted as a solid dark box. The hGH 3 'UTS region is depicted as an 
open box. The neo expression cassette is depicted as a shaded box with an arrowhead which 
corresponds to the direction of transcription. The thin dark line represents the plasmid backbone 
sequences. The position and direction of transcription of the p-lactamase gene (amp) is indicated 
by the solid boxed arrow. 

Message Optimization 

Methods of the invention are directed to optimized messages and synthetic nucleic acid 
sequences which direct the production of optimized mRNAs. An optimized mRNA can direct 
the synthesis of a protein of interest, e.g., a human protein, e.g. a human Factor VIIL A message 
for a protein of interest, e.g., human Factor VIII, can be optimized as described herein, e.g., by 
replacing at least 94%, 95%, 96%, 97%, 98%, 99%, and preferably all of the non-common 
codons or less-common codons with a common codon encoding the same amino acid as outlined 
in Table 1. 

The coding region of a synthetic nucleic acid sequence can include the sequence "eg 11 
without any discrimination, if the sequence is found in the common codon for that amino acid. 
Alternatively, the sequence "eg" can be limited in various regions, e.g., the first 20% of the 
coding sequence can be designed to have a low incidence of the sequence "eg". 

Optimizing a message (and its synthetic DNA sequence) can negatively or positively 
affect gene expression or protein production. For example, replacing a less-common codon with 
a more common codon may affect the half life of the mRNA or alter its structure by introducing 
a secondary structure that interferes with translation of the message. It may therefore be 
necessary, in certain instances, to alter the optimized message. 
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All or a portion of a message (or its gene) can be optimized. In some cases the desired 
modulation of expression is achieved by optimizing essentially the entire message. In other 
cases, the desired modulation will be achieved by optimizing part but not all of the message or 
gene. 

The codon usage of any coding sequence can be adjusted to achieve a desired property, 
for example high levels of expression in a specific cell type. The starting point for such an 
optimization may be a coding sequence with 100% common codons, or a coding sequence which 
contains a mixture of common and non-common codons. 

Two or more candidate sequences that differ in their codon usage are generated and 
tested to determine if they possess the desired property. Candidate sequences may be evaluated 
initially by using a computer to search for the presence of regulatory elements, such as silencers 
or enhancers, and to search for the presence of regions of coding sequence which could be 
converted into such regulatory elements by an alteration in codon usage. Additional criteria may 
include enrichment for particular nucleotides, e.g., A, C, G or U, codon bias for a particular 
amino acid, or the presence or absence of particular mRNA secondary or tertiary structure. 
Adjustment to the candidate sequence can be made based on a number of such criteria. 

Promising candidate sequences are constructed and then evaluated experimentally. 
Multiple candidates may be evaluated independently of each other, or the process can be 
iterative, either by using the most promising candidate as a new starting point, or by combining 
regions of two or more candidates to produce a novel hybrid. Further rounds of modification and 
evaluation can be included. 

Modifying the codon usage of a candidate sequence can result in the creation or 
destruction of either a positive or negative element. In general, a positive element refers to any 
element whose alteration or removal from the candidate sequence could result in a decrease in 
expression of the therapeutic protein, or whose creation could result in an increase in expression 
of a therapeutic protein. For example, a positive element can include an enhancer, a promoter, a 
downstream promoter element, a DNA binding site for a positive regulator (e.g., a transcriptional 
activator), or a sequence responsible for imparting or removing mRNA secondary or tertiary 
structure. A negative element refers to any element whose alteration or removal from the 
candidate sequence could result in an increase in expression of the therapeutic protein, or whose 
creation would result in a decrease in expression of the therapeutic protein. A negative element 
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includes a silencer, a DNA binding site for a negative regulator (e.g., a transcriptional repressor), 
a transcriptional pause site, or a sequence that is responsible for imparting or removing mRNA 
secondary or tertiary structure. In general, a negative element arises more frequently than a 
positive element. Thus, any change in codon usage that results in an increase in protein 
expression is more likely to have arisen from the destruction of a negative element rather than 
the creation of a positive element. In addition, alteration of the candidate sequence is more likely 
to destroy a positive element than create a positive element. In one embodiment, a candidate 
sequence is chosen and modified so as to increase the production of a therapeutic protein. The 
candidate sequence can be modified, e.g., by sequentially altering the codons or by randomly 
altering the codons in the candidate sequence. A modified candidate sequence is then evaluated 
by determining the level of expression of the resulting therapeutic protein or by evaluating 
another parameter, e.g., a parameter correlated to the level of expression. A candidate sequence 
which produces an increased level of a therapeutic protein as compared to an unaltered candidate 
sequence is chosen. 

In another approach, one or a group of codons can be modified, e.g., without reference to 
protein or message structure and tested. Alternatively, one or more codons can be chosen on a 
message-level property, e.g., location in a region of predetermined, e.g., high or low, GC or AU 
content, location in a region having a structure such as an enhancer or silencer, location in a 
region that can be modified to introduce a structure such as an enhancer or silencer, location in a 
region having, or predicted to have, secondary or tertiary structure, e.g., infra-chain pairing, 
inter-chain pairing, location in a region lacking, or predicted to lack, secondary or tertiary 
structure, e.g., intra-chain or inter-chain pairing. A particular modified region is chosen if it 
produces the desired result. 

Methods which systematically generate candidate sequences are useful. For example, 
one or a group, e.g., a contiguous block of codons, at various positions of a synthetic nucleic acid 
sequence can be replaced with common codons (or with non common codons, if for example, the 
starting sequence has been optimized) and the resulting sequence evaluated. Candidates can be 
generated by optimizing (or de-optimizing) a given "window" of codons in the sequence to 
generate a first candidate, and then moving the window to a new postion in the sequence, and 
optimizing (or de-optimizing) the codons in the new position under the window to provide a 
second candidate. Candidates can be evaluated by determining the level of expression they 
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provide, or by evaluating another parameter, e.g., a parameter correlated to the level of 
expression. Some parameters can be evaluated by inspection or computationally, e.g., the 
possession or lack thereof of high or low GC or AU content; a sequence element such as an 
enhancer or silencer; secondary or tertiary structure, e.g., intra-chain or inter-chain paring 

Thus, hybrid messages, i.e., messages having a region which is optimized and a region 
which is not optimized, can be evaluated to determine if they have a desired property. The 
evaluation can be effected by, e.g., synthesing the candidate message or messages, and 
determining a property such as its level of expression. Such a determination can be made in a 
cell-free system or in a cell-based system. The generation and testing of one or more candidates 
can also be performed, by computational methods, e.g., on a computer. For example, a 
computer program can be used to generate a number of candidate messages and those messages 
analysed by a computer program which predicts the existance of primary structure elements or 
secondary or tertairy structure. 

A candidate message can be generated by dividing a region into subregions and 
optimizing each subregion. An optimized subregion is then combined with a non-optimized 
subregion to produce a candidate. For example, a region is divided into three subregions, a, b 
and c, each of which is then optimized to provide optimized subregions a', b' and c\ The 
optimized subregions, a 5 , b J , and c' can then be combined with one or more of the non-optimized 
subregions, e.g., a, b and c. For example, ab'c could be formed and tested. Different 
combinations of optimized and non-optimized subregions can be generated. By evaluating a 
series of such hybrid candidate sequences, it is possible to analyze the effect of modification of 
different subregions and, e.g., to define the particular version of each subregion that contributes 
most to the desired property. A preferred candidate can include the versions of each subregion 
that performed best in a series of such experiments. 

An algorithm for creating an optimized candidate sequence is as follows: 

1 . Provide a message sequence (an entire message or a portion thereof). Go to step 2. 

2. Generate a novel candidate sequence by modifying the codon usage of a candidate 
sequence by using, the most promising candidate sequence previously identified, or 
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by combining regions of two or more candidates previously identified to produce a 
novel hybrid. Go to step 3. 

3. Evaluate the candidate sequence and determine if it has a predetermined property. If 
the candidate has the predetermined property, then proceed to step 4, otherwise 
proceed to step 2. 

4. Use the candidate sequence as an optimized message. 

Methods can include first optimizing a mammalian synthetic nucleic acid sequence which 
encodes a protein of interest or a portion thereof, e.g., human Factor VIII, etc. The synthetic 
nucleic acid sequence can be optimized such that 94%, 95%, 96%, 97%, 98%, 99%, or all, of the 
codons of the synthetic DNA are replaced with common codons. The next step involves 
determining the amount of protein produced as a result of message optimization compared to the 
amount of protein produced using the wild type sequence. In instances where the amount of 
protein produced is not of the desired or expected level, it may be desirable to replace one or 
more of the common codons of the protein coding region with a less-common codon or non- 
common codon. A mammalian optimized message which is re-engineered such that common 
codons are replaced with less-common or non-common mammalian codons, or common codons 
of other eukaryotic species can result in at least 1%, 5%, 10%, 20% or more of the common 
codons being replaced. Re-engineering the optimized message can be done, for example, 
systematically by replacing a single common codon with a less-common or non-common codon. 
Alternatively, a block of 2, 4, 6, 10, 20, 40 or more codons may be replaced with a less-common 
or non-common codons. The level of protein produced by these "re-engineered optimized" 
messages determines which re-engineered optimized message is chosen. 

Another approach of optimizing a message for increased protein expression includes 
altering the specific nucleotide content of an optimized synthetic nucleic acid sequence. The 
synthetic nucleic acid sequence can be altered by increasing or decreasing specific nucleotide(s) 
content, e.g., G, C, A, T, GC or AT content of the sequence. Increasing or decreasing the 
specific nucleotide content of a synthetic nucleotide sequence can be done by substituting the 
nucleotide of interest with another nucleotide. For example, a sequence that has a large number 
of codons that have a high GC content, e.g., glycine (GGC), can be substituted with codons that 
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have a less GC rich content, e.g., glycine (GGT) or an AT rich codon. Similarly, a sequence that 
has a large number of codons that have a high AT content, can be substituted with codons that 
have a less AT rich content, e.g., a GC rich codon. Any region, or all, of a synthetic nucleic acid 
sequence can be altered in this manner, e.g., the 5'UTR (e.g., the promoter-proximal coding 
region), the coding region, the intron sequence, or the 3'UTR. Preferably, nucleotide 
substitutions in the coding region do not result in an alteration of the amino acid sequence of the 
expressed product. Preferably, the nucleotide content, e.g., GC or AT content, of a sequence is 
increased or reduced by 10%, 20%, 30%, 40% or more. 

The synthetic nucleic acid sequence can encode a mammalian, e.g., a human protein. 
The protein can be, e.g., one which is endogenously a human, or an engineered protein. 
Engineered proteins include proteins which differ from the native protein by one or more amino 
acid residues. Examples of such proteins include fragments, e.g., internal fragments or 
truncations, deletions, fusion proteins, and proteins having one or more amino acid replacements. 

A sequence which encodes the protein can have one or more introns. The synthetic 
nucleic acid sequence can include introns, as they are found in the non-optimized sequence or 
can include introns from a non-related gene. In other embodiments the intronic sequences can be 
modified. For example, all or part of one or more introns present in the gene can be removed or 
introns not found in the sequence can be added. In preferred embodiments, one or more entire 
introns present in the gene are not present in the synthetic nucleic acid. In another embodiment, 
all or part of an intron present in a gene is replaced by another sequence, e.g., an intronic 
sequence from another protein. 

The synthetic nucleic acid sequence can encode: any protein including a blood factor, 
e.g., blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, blood clotting 
factor IX, blood clotting factor X, or blood clotting factor XIII; an interleukin, e.g., interleukin 1, 
interleukin 2, interleukin 3, interleukin 6, interleukin 1 1, or interleukin 12; erthropoietin; 
calcitonin; growth hormone; insulin; insulinotropin; insulin-like growth factors; parathyroid 
hormone; P-interferon; y-interferon; nerve growth factors; FSHp; tumor necrosis factor; 
glucagon; bone growth factor-2; bone growth factor-7 TSH-P; CSF-granulocyte; CSF- 
macrophage; CSF-granulocyte/macrophage; immunoglobulins; catalytic antibodies; protein 
kinase C; glucoccrebroasidase; superoxide dismantase; tissue plasminogen activator; urokinase; 
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antithrombin III; DNAse; a-galactosidase; tyrosine hydroxylase; apolipoprotein E; 
apolipoproetin A-I; globins; low density lipoprotein receptor; IL-2 receptor; IL-2 antagonists; 
alpha-1 antitrypsin; immune response modifiers; soluble CD4; a protein expressed under disease 
conditions; and proteins encoded by viruses, e.g., proteins which are encoded by a virus 
(including a retrovirus) which are expressed in mammalian cells post-infection. 

In preferred embodiments, the synthetic nucleic acid sequence can express its protein, 
e.g., a eukaryotic e.g., mammalian, protein, at a level which is at least 110%, 150%, 200%, 
500%, 1,000%, 5,000% or even 10,000% of that expressed by nucleic acid sequence that has not 
been optimized. This comparison can be made, e.g., in an in vitro mammalian cell culture 
system wherein the non-optimized and optimized sequence are expressed under the same 
conditions (e.g., the same cell type, same culture conditions, same expression vector). 

Suitable cell culture systems for measuring expression of the synthetic nucleic acid 
sequence and corresponding non-optimized nucleic acid sequence are known in the art. (e.g., the 
pBS phagemic vectors, Stratagene, La Jolla, CA) and are described in, for example, the standard 
molecular biology reference books. Vectors suitable for expressing the synthetic and non- 
optimized nucleic acid sequences encoding the protein of interest are described below and in the 
standard reference books described below. Expression can be measured using an antibody 
specific for the protein of interest (e.g., ELISA). Such antibodies and measurement techniques 
are known to those skilled in the art. 

In a preferred embodiment the protein is a human protein. In more preferred 
embodiments, the protein is human Factor VIII and the protein is a B domain deleted human 
Factor VIII. In another preferred embodiment the protein is B domain deleted human Factor 

VIII with a sequence which includes a recognition site for an intracellular protease of the 
PACE/furin class, such as X-ARG-X-X-ARG site, a short-peptide linker, e.g., a two peptide 
linker, e.g., a leucine-glutamic acid peptide linker (LE), or a three, or four peptide linker, inserted 
at the heavy-light chain junction (see Fig. 1). 

A large fraction of the codons in the human messages encoding Factor VIII and Factor 

IX are non-common codons or less common codons. Replacement of at least 98% of these 
codons with common codons will yield nucleic acid sequences capable of higher level 
expression in a cell culture. Preferably, all of the codons are replaced with common codons and 
such replacement results in at least a 5 fold, more preferably a 10 fold and most preferably a 20 
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fold increase in expression when compared to an expression of the corresponding native 
sequence in the same expression system. 

The synthetic nucleic acid sequences of the invention can be introduced into the cells of a 
living organism. The sequences can be introduced directly, e.g., via homologous recombination, 
or via a vector. For example, DNA constructs or vectors can be used to introduce a synthetic 
nucleic acid sequence into cells of a living organism for gene therapy. See, e.g., U.S. Patent No. 
5,460,959; and co-pending U.S. applications USSN 08/334,797; USSN 08/231,439; USSN 
08/334,455; and USSN 08/928,881 which are hereby expressly incorporated by reference in their 
entirety. 

Transfected or Infected Cells 

Primary and secondary cells to be transfected can be obtained from a variety of tissues 
and include cell types which can be maintained and propagated in culture. For example, primary 
and secondary cells which can be transfected include fibroblasts, keratinocytes, epithelial cells 
(e.g., mammary epithelial cells, intestinal epithelial cells), endothelial cells, glial cells, neural 
cells, a cell comprising a formed element of the blood (e.g., lymphocytes, bone marrow cells), 
muscle cells and precursors of these somatic cell types. Primary cells are preferably obtained 
from the individual to whom the transfected primary or secondary cells are administered. 
However, primary cells may be obtained from a donor (other than the recipient) of the same 
species or another species (e.g., mouse, rat, rabbit, cat, dog, pig, cow, bird, sheep, goat, horse). 

Primary or secondary cells of vertebrate, particularly mammalian, origin can be 
transfected with exogenous synthetic DNA encoding a therapeutic protein and produce an 
encoded therapeutic protein stably and reproducibly, both in vitro and in vivo, over extended 
periods of time. In addition, the transfected primary and secondary cells can express the encoded 
product in vivo at physiologically relevant levels, cells can be recovered after implantation and, 
upon reculturing, to grow and display their preimplantation properties. 

The transfected primary or secondary cells may also include DNA encoding a selectable 
marker which confers a selectable phenotype upon them, facilitating their identification and 
isolation. Methods for producing transfected primary, secondary cells which stably express 
exogenous synthetic DNA, clonal cell strains and heterogenous cell strains of such transfected 
cells, methods of producing the clonal and heterogenous cell strains, and methods of treating or 
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preventing an abnormal or undesirable condition through the use of populations of transfected 
primary or secondary cells are part of the present invention. Primary and secondary cells which 
can be transfected include fibroblasts, keratinocytes, epithelial cells (e.g., mammary epithelial 
cells, intestinal epithelial cells), endothelial cells, glial cells, neural cells, a cell comprising a 
formed element of the blood (e.g., a lymphocyte, a bone marrow cell), muscle cells and 
precursors of these somatic cell types. Primary cells are preferably obtained from the individual 
to whom the transfected primary or secondary cells are administered. However, primary cells 
may be obtained from a donor (other than the recipient) of the same species or another species 
(e.g., mouse, rat, rabbit, cat, dog, pig, cow, bird, sheep, goat, horse). Transformed or 
immortalized cells can also be used e.g., a Bowes Melanoma cell (ATCC Accession No. CRL 
9607), a Daudi cell (ATCC Accession No. CCL 213), a HeLa cell and a derivative of a HeLa cell 
(ATCC Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 cell (ATCC Accession No. 
CCL 240), a HT1080 cell (ATCC Accession No. CCL 121), a Jurkat cell (ATCC Accession No. 
TIB 152), a KB carcinoma cell (ATCC Accession No. CCL 17), a K-562 leukemia cell (ATCC 
Accession No. CCL 243), a MCF-7 breast cancer cell (ATCC Accession No. BTH 22), a MOLT- 
4 cell (ATCC Accession No. 1582), a Namalwa cell (ATCC Accession No. CRL 1432), a Raji 
cell (ATCC Accession No. CCL 86), a RPMI 8226 cell (ATCC Accession No. CCL 155), a U- 
937 cell (ATCC Accession No. CRL 1593), WI-38VA13 sub line 2R4 cells (ATCC Accession 
No. CLL 75.1), a CCRF-CEM cell (ATCC Accession No. CCL 1 19) and a 2780AD ovarian 
carcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927-5932, 1988), as well as 
heterohybridoma cells produced by fusion of human cells and cells of another species.. In 
another embodiment, the immortalized cell line can be a cell line other than a human cell line, 
e.g., a CHO cell line. In a preferred embodiment, the cell is a non-transformed cell. In various 
preferred embodiments, the cell is a mammalian cell, e.g., a primary or secondary mammalian 
cell, e.g., a fibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, an epithelial cell, an 
endothelial cell, a glial cell, a neural cell, a cell comprising a formed element of the blood, a 
muscle cell and precursors of these somatic cells. In a most preferred embodiment, the cell is a 
secondary human fibroblast. 

Alternatively, DNA can be delivered into any of the cell types discussed above by a viral 
vector infection. Viruses known to be useful for gene transfer include adenoviruses, adeno- 
associated virus, herpes virus, mumps virus, poliovirus, retroviruses, Sindbis virus, and vaccinia 

32 


virus such as canary pox virus. Use of viral vectors is well known in the art: see e.g., Robbins 
and Ghizzani, "Viral Vectors for Gene Therapy", Mol Med Today 1:410-417, 1995. A cell 
which has an exogneous DNA introduced into it by a viral vector is referred to as an "infected 
cell" 

The invention also includes the genetic manipulation of a cell which normally produces a 
therapeutic protein. In this instance, the cell is manipulated such that the endogenous sequence 
which encodes the therapeutic protein is replaced with an optimized coding sequence, e.g., by 
homologous recombination. 

Exogenous Synthetic DNA 

Exogenous synthetic DNA incorporated into primary or secondary cells by the present 
method can be a synthetic DNA which encodes a protein, or a portion thereof, useful to treat an 
existing condition or prevent it from occurring. 

Synthetic DNA incorporated into primary or secondary cells can be an entire gene 
encoding an entire desired protein or a gene portion which encodes, for example, the active or 
functional protion(s) of the protein. The protein can be, for example, a hormone, a cytokine, an 
antigen, an antibody, an enzyme, a clotting factor, e.g., Factor VIII or Factor XI, a transport 
protein, a receptor, a regulatory protein, a structural protein, or a protein which does not occur in 
nature. The DNA can be produced, using genetic engineering techniques or synthetic processes. 
The DNA introduced into primary or secondary cells can encode one or more therapeutic 
proteins. After introduction into primary or secondary cells, the exogenous synthetic DNA is 
stably incorporated into the recipient cell's genome (along with the additional sequences present 
in the DNA construct used), from which it is expressed or otherwise functions. Alternatively, the 
exogenous synthetic DNA may exist episomally within the primary or secondary cells. 

Selectable Markers 

A variety of selectable markers can be incorporated into primary or secondary cells. For 
example, a selectable marker which confers a selectable phenotype such as drug resistance, 
nutritional auxotrophy, resistance to a cytotoxic agent or expression of a surface protein, can be 
used. Selectable marker genes which can be used include neo, gpt, dhfr, ada, pac (puromycin), 

33 


hyg and hisD. The selectable phenotype conferred makes it possible to identify and isolate 
recipient primary or secondary cells. 

DNA Constructs 

DNA constructs, which include exogenous synthetic DNA and, optionally, DNA 
encoding a selectable marker, along with additional sequences necessary for expression of the 
exogenous synthetic DNA in recipient primary or secondary cells, are used to transfect primary 
or secondary cells in which the encoded protein is to be produced. Alternatively, infectious 
vectors, such as retroviral, herpes, lentivirus, adenovirus, adenovirus-associated, mumps and 
poliovirus vectors, can be used for this purpose. 

A DNA construct which includes the exogenous synthetic DNA and additional 
sequences, such as sequences necessary for expression of the exogenous synthetic DNA, can be 
used. A DNA construct which includes DNA encoding a selectable marker, along with 
additional sequences, such as a promoter, polyadenylation site and splice junctions, can be used 
to confer a selectable phenotype upon introduction into primary or secondary cells. The two 
DNA constructs are introduced into primary or secondary cells, using methods described herein. 
Alternatively, one DNA construct which includes exogenous synthetic DNA, a selectable 
marker gene and additional sequences (e.g., those necessary for expression of the exogenous 
synthetic DNA and for expression of the selectable marker gene) can be used. 

Transfection of Primary or Secondary Cells and Production of Clonal or Heterogenous Cell 
Strains 

Vertebrate tissue can be obtained by standard methods such as punch biopsy or other 
surgical methods of obtaining a tissue source of the primary cell type of interest. For example, 
punch biopsy is used to obtain skin as a source of fibroblasts or keratinocytes. A mixture of 
primary cells is obtained from the tissue, using known methods, such as enzymatic digestion. If 
enzymatic digestion is used, enzymes such as collagenase, hyaluronidase, dispase, pronase, 
trypsin, elastase and chymotrypsin can be used. 

The resulting primary cell mixture can be transfected directly or it can be cultured first, 
removed from the culture plate and resuspended before transfection is carried out. Primary cells 
or secondary cells are combined with exogenous synthetic DNA to be stably integrated into their 
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genomes and, optionally, DNA encoding a selectable marker, and treated in order to accomplish 
transfection. The exogenous synthetic DNA and selectable marker-encoding DNA are each on a 
separate construct or on a single construct and an appropriate quantity of DNA to ensure that at 
least one stably transfected cell containing and appropriately expressing exogenous DNA is pro- 
duced. In general, 0.1 to 500 ug DNA is used. 

Primary or secondary cells, can be transfected by electroporation. Electroporation is 
carried out at appropriate voltage and capacitance (and time constant) to result in entry of the 
DNA construct(s) into the primary or secondary cells. Electroporation can be carried out over a 
wide range of voltages (e.g., 50 to 2000 volts) and capacitance values (e.g., 60-300 jaFarads). 
Total DNA of approximately 0.1 to 500 ug is generally used. 

Primary or secondary cells can be transfected using microinjection. Alternatively, known 
methods such as calcium phosphate precipitation, modified calcium phosphate precipitation and 
polybrene precipitation, liposome fusion and receptor-mediated gene delivery can be used to 
transfect cells. A stably, transfected cell is isolated and cultured and subcultivated, under 
culturing conditions and for sufficient time, to propagate the stably transfected secondary cells 
and produce a clonal cell strain of transfected secondary cells. Alternatively, more than one 
transfected cell is cultured and subculturated, resulting in production of a heterogenous cell 
strain. 

Transfected primary or secondary cells undergo a sufficient number of doublings to 
produce either a clonal cell strain or a heterogenous cell strain of sufficient size to provide the 
therapeutic protein to an individual in effective amounts. In general, for example, 0.1 cm 2 of 
skin is biopsied and assumed to contain 100,000 cells; one cell is used to produce a clonal cell 
strain and undergoes approximately 27 doublings to produce 100 million transfected secondary 
cells. If a heterogenous cell strain is to be produced from an original transfected population of 
approximately 100,000 cells, only 10 doublings are needed to produce 100 million transfected 
cells. 

The number of required cells in a transfected clonal or heterogenous cell strain is variable 
and depends on a variety of factors, including but not limited to, the use of the transfected cells, 
the functional level of the exogenous DNA in the transfected cells, the site of implantation of the 
transfected cells (for example, the number of cells that can be used is limited by the anatomical 
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site of implantation), and the age, surface area, and clinical condition of the patient. To put these 
factors in perspective, to deliver therapeutic levels of human growth hormone in an otherwise 
healthy 10 kg patient with isolated growth hormone deficiency, approximately one to five 
hundred million transfected fibroblasts would be necessary (the volume of these cells is about 
that of the very tip of the patient's thumb). 

Episomal Expression of Exogenous Synthetic DNA 

DNA sequences that are present within the cell yet do not integrate into the genome are 
referred to as episomes. Recombinant episomes may be useful in at least three settings: 1) if a 
given cell type is incapable of stably integrating the exogenous synthetic DNA; 2) if a given cell 
type is adversely affected by the integration of synthetic DNA; and 3) if a given cell type is 
capable of improved therapeutic function with an episomal rather than integrated synthetic DNA. 

Using transfection and culturing as described herein, exogenous synthetic DNA in the 
form of episomes can be introduced into vertebrate primary and secondary cells. Plasmids can 
be converted into such an episome by the addition DNA sequences for the Epstein-Barr virus 
origin of replication and nuclear antigen (Yates, J.L. Nature 319:780-7883 (1985)). 
Alternatively, vertebrate autonomously replicating sequences can be introduced into the 
construct (Weidle, U.H. Gene 73(2):427-437 (1988). These and other episomally derived 
sequences can also be included in DNA constructs without selectable markers, such as pXGH5 
(Selden et al. ? Mol Cell Biol 6:3173-3179, 1986), The episomal synthetic exogenous DNA is 
then introduced into primary or secondary vertebrate cells as described in this application (if a 
selective marker is included in the episome a selective agent is used to treat the transfected cells). 

Implantation of Clonal Cell Strain or Heterogenous Cell Strain of Transfected Secondary Cells 

The transfected cells produced as described above can be introduced into an individual to 
whom the therapeutic protein is to be delivered, using known methods. The clonal cell strain or 
heterogenous cell strain is then introduced into an individual, using known methods, using 
various routes of administration and at various sites (e.g., renal subcapsular, subcutaneous, 
central nervous system (including intrathecal), intravascular, intrahepatic, intrasplanchnic, 
intraperitoneal (including intraomental, or intramuscular implantation). In a preferred 
embodiment, the clonal cell strain or heterogeneous cell strain is introduced into the omentum. 
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The omentum is a membranous structure containing a sheet of fat. Usually, the omentum is a 
fold of peritoneum extending from the stomach to adjacent abdominal organs. The greater 
omentim is attached to the inferior edge of the stomach and hangs down in front of the intestines. 
The other edge is attached to the transverse colon. The lesser omentum is attached to the 
superior edge of the stomach and extends to the undersurface of the liver. The cells may be 
introduced into any part of the omentum by surgical implantation, laparoscopy or direct 
injection, e.g., via CT-guided needle or ultrasound. Once implanted in the individual, the cells 
produce the therapeutic product encoded by the exogenous synthetic DNA or are affected by the 
exogenous synthetic DNA itself. For example, an individual who has been diagnosed with 
Hemophilia A, a bleeding disorder that is caused by a deficiency in Factor VIII, a protein 
normally found in the blood, is a candidate for a gene therapy treatment. In another example, an 
individual who has been diagnosed with Hemophilia B, a bleeding disorder that is caused by a 
deficiency in Factor IX, a protein normally found in the blood, is a candidate for a gene therapy 
treatment The patient has a small skin biopsy performed; this is a simple procedure which can 
be performed on an out-patient basis. The piece of skin, approximately the size of a matchhead, 
is taken, for example, from under the arm and requires about one minute to remove. The sample 
is processed, resulting in isolation of the patient's cells and genetically engineered to produce the 
missing Factor IX or Factor VIIL Based on the age, weight, and clinical condition of the patient, 
the required number of cells are grown in large-scale culture. The entire process requires 4-6 
weeks and, at the end of that time, the appropriate number, e.g., approximately 100-500 million 
genetically-engineered cells are introduced into the individual, once again as an outpatient (e.g., 
by injecting them back under the patient's skin). The patient is now capable of producing his or 
her own Factor IX or Factor VIII and is no longer a hemophiliac. 

A similar approach can be used to treat other conditions or diseases. For example, short 
stature can be treated by administering human growth hormone to an individual by implanting 
primary or secondary cells which express human growth hormone; anemia can be treated by 
administering erythropoietin (EPO) to an individual by implanting primary or secondary cells 
which express EPO; or diabetes can be treated by administering glucogen-like peptide-1 (GLP-1) 
to an individual by implanting primary or secondary cells which express GLP-1 . A lysosomal 
storage disease (LSD) can be treated by this approach. LSD's represent a group of at least 41 
distinct genetic diseases, each one representing a deficiency of a particular protein that is 
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involved in lysosomal biogenesis. A particular LSD can be treated by administering a lysosomal 
enzyme to an individual by implanting primary or secondary cells which express the lysosomal 
enzyme, e.g., Fabry Disease can be treated by administering cc-galactosidase to an individual by 
implanting primary or secondary cells which express oc-galactosidase; Gaucher disease can be 
treated by administering p-glucoceramidase to an individual by implanting primary or secondary 
cells which express p-glucoceramidase; MPS (mucopolysaccharidosis) type 1 (Hurley-Scheie 
syndrome) can be treated by administering a-iduronidase to an individual by implanting primary 
or secondary cells which express a-iduronidase; MPS type II (Hunter syndrome) can be treated 
by administering a-L-iduronidase to an individual by implanting primary or secondary cells 
which express a-L-iduronidase; MPS type III-A (Sanfilipo A syndrome) can be treated by 
administering glucosamine-N-sulfatase to an individual by implanting primary or secondary cells 
which express glucosamine-N-sulfatase; MPS type III-B (Sanfilipo B syndrome) can be treated 
by adminitering alpha-N-acetylglucosaminidase to an individual by implanting primary or 
secondary cells which express alpha-N-acetylglucosaminidase; MPS type III-C (Sanfilipo C 
syndrome) can be treated by administering acetylcoenzyme A:a-glucosmainide-N- 
acetyltransferase to an individual by implanting primary or secondary cells which express 
acetylcoenzyme A:a-glucosmainide-N-acetyltransferase; MPS type 111-D (Sanfilippo D 
syndrome) can be treated by administering N-acetylglucosamine-6-sulfatase to an individual by 
implanting primary or secondary cells which express N-acetylglucosamine-6-sulfatase; MPS 
type IV-A (Morquip A syndrome) can be treated by administering N-Acetylglucosamine-6- 
sulfatase to an individual by implanting primary or secondary cells which express N- 
acetylglucosamine-6-sulfatase; MPS type IV-B (Morquio B syndrome) can be treated by 
administering p-galactosidase to an individual by implanting primary or secondary cells which 
express P-galactosidase; MPS type VI (Maroteaux-Larry syndrome) can be treated by 
administering N-acetylgalactosamine-6-sulfatase to an individual by implanting primary or 
secondary cells which express N-acetylgalactosamine-6-sulfatase; MPS type VII (Sly syndrome) 
can be treated by administering P-glucuronidase to an individual by implanting primary or 
secondary cells which express p-glucuronidase. 

The cells used for implantation will generally be patient-specific genetically-engineered 
cells. It is possible, however, to obtain cells from another individual of the same species or from 
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a different species. Use of such cells might require administration of an immunosuppressant, 
alteration of histocompatibility antigens, or use of a barrier device to prevent rejection of the 
implanted cells. For many diseases, this will be a one-time treatment and, for others, multiple 
gene therapy treatments will be required. 

Uses of Transfected or Infected Primary and Secondary Cells and Cell Strains 
Transfected or infected primary or secondary cells or cell strains have wide applicability 
as a vehicle or delivery system for therapeutic proteins, such as enzymes, hormones, cytokines, 
antigens, antibodies, clotting factors, anti-sense RNA, regulatory proteins, transcription proteins, 
receptors, structural proteins, novel (non-optimized) proteins and nucleic acid products, and 
engineered DNA. For example, transfected primary or secondary cells can be used to supply a 
therapeutic protein, including, but not limited to, Factor VIII, Factor IX, erythropoietin, alpha- 1 
antitrypsin, calcitonin, glucocerebrosidase, growth hormone, low density lipoprotein (LDL), 
receptor IL-2 receptor and its antagonists, insulin, globin, immunoglobulins, catalytic antibodies, 
the interleukins, insulin-like growth factors, superoxide dismutase, immune responder modifiers, 
parathyroid hormone and interferon, nerve growth factors, tissue plasminogen activators, and 
colony stimulating factors. Alternatively, transfected primary and secondary cells can be used to 
immunize an individual (i.e., as a vaccine). 

The wide variety of uses of cell strains of the present invention can perhaps most 
conveniently be summarized as shown below. The cell strains can be used to deliver the 
following therapeutic products. 


1. 

a secreted protein with predominantly systemic effects; 

2. 

a secreted protein with predominantly local effects; 

3. 

a membrane protein imparting new or enhanced cellular responsiveness; 

4. 

membrane protein facilitating removal of a toxic product; 

5. 

a membrane protein marking or targeting a cell; 

6. 

an intracellular protein; 

7. 

an intracellular protein directly affecting gene expression; and 

8. 

an intracellular protein with autocrine effects. 
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Transfected or infected primary or secondary cells can be used to administer therapeutic 
proteins (e.g., hormones, enzymes, clotting factors) which are presently administered 
intravenously, intramuscularly or subcutaneously, which requires patient cooperation and, often, 
medical staff participation. When transfected or infected primary or secondary cells are used, 
there is no need for extensive purification of the polypeptide before it is administered to an 
individual, as is generally necessary with an isolated polypeptide. In addition, transfected or 
infected primary or secondary cells of the present invention produce the therapeutic protein as it 
would normally be produced. 

An advantage to the use of transfected or infected primary or secondary cells is that by 
controlling the number of cells introduced into an individual, one can control the amount of the 
protein delivered to the body. In addition, in some cases, it is possible to remove the transfected 
or infected cells if there is no longer a need for the product. A further advantage of treatment by 
use of transfected or infected primary or secondary cells of the present invention is that 
production of the therapeutic product can be regulated, such as through the administration of 
zinc, steroids or an agent which affects transcription of a protein, product or nucleic acid product 
or affects the stability of a nucleic acid product. 

Transgenic animals 

A number of methods have been used to obtain transgenic, non-human mammals. A 
transgenic non-human mammal refers to a mammal that has gained an additional gene through 
the introduction of an exogenous synthetic nucleic acid sequence, i.e., transgene, into its own 
cells (e.g., both the somatic and germ cells), or into an ancestor's germ line. 

There are a number of methods to introduce the exogenous DNA into the germ line (e.g., 
introduction into the germ or somatic cells) of a mammal. One method is by microinjection of a 
the gene construct into the pronucleus of an early stage embryo (e.g., before the four-cell stage) 
(Wagner, et al., Proa Natl Acad. ScL USA 78:5016 (1981); Brinster, et al., Proc Natl Acad Sci 
USA 82:4438 (1985)). The detailed procedure to produce such transgenic mice has been 
described (see e.g., Hogan, et al., Manipulating the Mouse Embryo, Cold Spring Harbour 
Laboratory, Cold Spring Harbour, NY (1986); US Patent No. 5,175,383 (1992)). This procedure 
has also been adapted for other mammalian species (e.g., Hammer, et al., Nature 315:680 (1985); 
Murray, et al, Reprod. Fert. Devi 1:147 (1989); Pursel, et al., Vet. Immunol. Histopath. 17:303 
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(1987) ; Rexroad, et al, J. Reprod. Fert. 41 (suppl):119 (1990); Rexroad, et al, Molec. Reprod. 
Devi 1:164 (1989); Simons, et al, BioTechnology 6:179 (1988); Vize, et al, J. Cell. Sci. 90:295 

(1988) ; and Wagner, J. Cell. Biochem. 13B (suppl):164 (1989). 

Another method for producing germ-line transgenic mammals is through the use of 
embryonic stem cells. The gene construct may be introduced into embryonic stem cells by 
homologous recombination (Thomas, et al. Cell 51:503 (1987); Capecchi, Science 244:1288 

(1989) ; Joyner, et al. Nature 338: 153 (1989)). A suitable construct may also be introduced into 
the embryonic stem cells by DNA-mediated transfection, such as electroporation (Ausubel, et al. 
Current Protocols in Molecular Biology, John Wiley & Sons (1987)). Detailed procedures for 
culturing embryonic stem cells (e.g. ESD-3, ATCC# CCL-1934, ES-E14TG-2a, ATCC# CCL- 
1821, American Type Culture Collection, Rockville, MD) and the methods of making transgenic 
mammals from embryonic stem cells can be found in Teratocarcinomas and Embryonic Stem 
Cells, A Practical Approach, ed, E J. Robertson (IRL Press, 1987). 

In the above methods for the generation of a germ-line transgenic mammals, the construct 
may be introduced as a linear construct, as a circular plasmid, or as a vector which may be 
incorporated and inherited as a transgene integrated into the host genome. The transgene may 
also be constructed so as to permit it to be inherited as an extrachromosomal plasmid (Gassmann, 
M. et al, Proc. Natl Acad. Set USA 92:1292 (1995)). 

Human Factor VIII 

hFVIII is encoded by a 186 kilobase (kb) gene, with the coding region distributed among 
26 exons (Gitchier et al. Nature, 312:326-330, (1984)). Transcription of the gene and splicing 
of the resulting primary transcript results in an mRNA of approximately 9 kb which encodes a 
primary translation product containing 2351 amino acids (aa), including a 19 aa signal peptide. 
Excluding the signal peptide, the 2332 aa protein has a domain structure which can be 
represented as NH2-A1-A2-B-A3-C1-C2-COOH, with a predicted molecular mass of 265 
kilodaltons (kD). Glycosylation of this protein results in a product with a molecular mass of 
approximately 330 kD as determined by SDS-PAGE. In plasma, hFVIII is a heterodimeric 
protein consisting of a heavy chain that ranges in size from 90 kD to 200 kD in a metal ion 
complex with an 80 kD light chain. The heterodimeric complex is further stabilized by 
interactions with vWF. The heavy chain is comprised of domains A1-A2-B and the light chain is 
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comprised of domains A3-C1-C2 (Figure 2). Protease cleavage sites in the B-domain account 
for the size variation of the heavy chain, with the 90 kD species containing no B-domain 
sequences and the 200 kD species containing a complete or nearly complete B-domain. The B- 
domain has no known function and it is fully removed upon hFVIII activation by thrombin. 

Human Factor VIII expression plasmids, plasmids pXF8.186 (Figure 3), pXF8.61 (Figure 
4), pXF8.38 (Fig. 1 1) and pXF8.224 (Fig. 13) are described below. The hFVIII expression 
construct plasmid pXF8,186, was developed based on detailed optimization studies which 
resulted in high level expression of a functional hFVIII. Given the extremely large size of the 
hFVIII gene and the need to transfer the entire coding region into cells, cDNA expression 
plasmids were developed for the production of stably transfected clonal cell strains. It has 
proven difficult to achieve high level expression of hFVIII using the wild-type 9 kb cDNA. 
Three potential reasons for the poor expression are as follows. First, the wild-type cDNA 
encodes the 909 aa, heavily glycosylated B-domain which is transiently attached to the heavy 
chain and has no known function (Figure 1). Removal of the region encoding the B-domain 
from hFVIII expression constructs leads to greatly improved expression of a functional protein. 
Analysis of hFVIII derivatives lacking the B-domain has demonstrated that hFVIII function is 
not adversely affected and that such molecules have biochemical, immunologic, and in vivo 
functional properties which are very similar to the wild-type protein. Two different BDD hFVIII 
expression constructs have been developed, which encode proteins with different amino acid 
sequences flanking the deletion. Plasmid pXF8.186 contains a complete deletion of the B- 
domain (amino acids 741-1648 of the wild- type mature protein sequence), with the sequence 
Arg-Arg-Arg-Arg (RRRR) inserted at the heavy chain-light chain junction (Figure 1). This 
results in a string of five consecutive arginine residues (RRRRR or 5R) at the heavy chain-light 
chain junction, which comprises a recognition site for an intracellular protease of the PACE/furin 
class, and was predicted to promote cleavage to produce the correct heavy and light chains. 
Plasmid pXF8.61 also contains a complete deletion of the B-domain with a synthetic Xhol site at 
the junction. This linker results in the presence of the dipeptide sequence Leu-Glu (LE) at the 
heavy chain-light chain junction in the two forms of BDD hFVIII, the expressed proteins are 
referred to herein as 5R and LE BDD hFVIII. 

The second feature which has been reported to adversely affect hFVIII expression in 
transfected cells relates to the observation that one or more regions of the coding region have 
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been identified which effectively function to block transcription of the cDNA sequence. The 
inventors have now discovered that the negative influence of the sequence elements can be 
reduced or eliminated by altering the entire coding sequence. To this end, a completely synthetic 
B-domain deleted hFVIII cDNA was prepared as described in greater detail below. Silent base 
changes were made in all codons which did not correspond to the triplet sequence most 
frequently found for that amino acid in highly expressed human proteins, and such codons were 
converted to the codon sequence most frequently found in humans for the corresponding amino 
acid. The resulting coding sequence has a total of 1094 of 4335 base pairs which differ from the 
wild-type sequence, yet it encodes a protein with the wild-type hFVIII sequence (with the 
exception of the deletion of the B-domain). 25.2% of the bases were changed, and the GC 
content of the sequence increased from 44% to 64%. This sequence-altered BDD hFVIII cDNA 
is expressed at least 5.3-fold more efficiently than a non-altered control construct. 

The third feature which was optimized to improve hFVIII expression was the intron-exon 
structure of the expression construct. The cDNA is, by definition, devoid of introns. While this 
reduces the size of the expression construct, it has been shown that introns can have strong 
positive effects on gene expression when added to cDNA expression constructs. The 5' 
untranslated region of the human beta-actin gene, which contains a complete, functional intron 
was incorporated into the BDD hFVIII expression constructs pXF8.61 and pXF8.186. 

The fourth feature which can adversely affect hFVIII expression is the stability of the 
Factor VIII mRNA. The stability of the message can affect the steady-state level of the Factor 
VIII mRNA, and influence gene expression. Specific sequences within Factor VIII can be 
altered so as to increase the stability of the mRNA, e.g., the removal of AURE from the 3' UTR 
can result in a more stable Factor VIII mRNA. The data presented below show that coding 
sequence re-engineering has general utility for the improvement of expression of mammalian and 
non-mammalian eukaryotic genes in mammalian cells. The results obtained here with human 
Factor VIII suggest that systemic codon optimization (with disregard to CpG content) provides a 
fruitful strategy for improving the expression in mammalian cells of a wide variety of eukaryotic 
genes. 


Methods of Making Synthetic Nucleotide Sequences 
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A synthetic nucleic acid sequence which directs the synthesis of an optimized message of 
the invention can be made, e.g., by any of the methods described herein. The methods described 
below are advantageous for making optimized messages for the following reasons: 

1) they allow for production of a highly optimized protein, e.g., a protein having at 
least 94 to 100% of codons as common codons, especially for proteins larger than 90 amino 
acids in length. The final product can be 100% optimized, i.e., every single nucleotide is as 
chosen, without the need to introduce undesirable alterations every 100-300 bp. A gene can be 
synthesized with 100% optimized codons, or it can be synthesized with 100% the codons that are 
desired. Additional DNA sequence elements can be introduced or avoided without any 
limitations imposed by the need to introduce restriction enzyme sites. Such sequence elements 
could include: 

- Transcriptional signals, such as enhancers or silencers. 

- Splicing signals, for example avoiding cryptic splice sites in a cDNA, or optimizing the splice 
site context in an intron-containing gene. Adding an intron to a cDNA may aid expression and 
allows the introduction of transcriptional signals within the gene. 

- Instability signals - the creation or avoidance of sequences that direct mRNA breakdown. 

- Secondary structure - the creation or avoidance of secondary structures in the mRNA that may 
affect mRNA stability, transcriptional termination, or translation. 

- Translational signals - Codon choice. A gene can be synthesized with 100% optimal codons, or 
the codon bias for any amino acid can be altered without restriction to make gene expression 
sensitive to the concentration of an amino-acyl-tRNA, whose concentration may vary with 
growth or metabolic conditions. 

In each case, the goal may be to increase or decrease expression to bring expression 
under a particular form of regulation. 

2) they improve accuracy of the synthetic sequence because they avoid PCR 
amplification which introduces errors into the amplified sequence; and 

3) they reduce the cost of making the synthetic sequence of the invention. 

The synthetic nucleic acid sequence which direct the synthesis of the optimized messages 
of the invention can be prepared, e.g., by using the strategy which is outlined in greater detail 
below. 
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Strategy for building a sequence 

The initial step is to devise a cloning protocol. 

A sequence file containing 100% the desired DNA sequence is generated. This sequence 
is analyzed for restriction sites, including fusion sites. 
Fusion sites are, in order of preference: 

A) Sequences resulting from the ligation of two complementary overhangs normally generated 
by available restriction enzymes, e.g., 

Sall/Xhol = G A TCGAG 
CAGCT A C 

or BspDI/BstBI - AT A CGAA 
TAGC A TT 

or BstBI/AccI - TT A CGAC 
AAGC A TG. 

B) Sequences resulting from the ligation of two overhangs generated by partially filling-in the 
overhangs of available restriction enzymes, e.g., 

XhoI(+TC)/BamHI(+GA) - CTC A GATCC. 

GAGCT A AGG 

C) Sequences resulting from the blunt ligation of two blunt ends normally generated by available 
restriction enzymes, e.g., 

Ehel/Smal= GGC A GGG 
CCG A CCC. 

D) Sequences resulting from the blunt ligation of two blunt ends, where one or both blunt ends 
have been generated by filling in an overhang, e.g., 

BamHI(+GATC)/SmaI - GGATC A GGG 

CCTAG A CCC 


The filling-in of a 5' overhang generated by a restriction enzyme is performed using a 
DNA polymerase, for example the Klenow fragment of DNA Polymerase I. If the overhang is to 
be filled in completely, then all four nucleotides, dATP, dCTP, dGTP, and dTTP, are included in 
the reaction.. If the overhang is to be only partially filled in, then the requisite nucleotides are 
omitted from the reaction, In item (B) above, the Xhol-digested DNA would be filled in by 
Klenow in the presence of dCTP and dTTP and by omitting dATP and dGTP. An order of 
cloning steps is determined that allows the use of sites about 150-500 bp apart. Note that a 
fragment must lack the recognition sequence for an enzyme, only if that enzyme is used to clone 
the fragment. For example, the strategy for the construction of the "desired" Factor VIII coding 
sequence can use ApaLI in a number of different places, because of the order of assembly of the 
fragments - ApaLI is not used in any of the later cloning steps. 

If there is a region where no useful sites are available, then a sequence-independent 
strategy can be used: fragments are cloned into a DNA construct that contain recognition 
sequences for restriction enzymes that cleave outside of their recognition sequence, 
e.g. BseRI = GAGG AGNNNNNNNNN N A (SEQ ID NO:5) 
CTCCTC NNNNNNNN A NN (SEQ ID NO:6) 

DNA construct cloning site gene fragment 

The recognition sequence of the enzyme used to clone the fragment will be removed 
when the fragment is released by digestion with, e.g. BseRI, leaving a fragment consisting of 
100% of the desired sequence, which can then be ligated to a similarly generated adjacent gene 
fragment. 

The next step is to synthesize initial restriction fragments. 

The synthesis of the initial restriction fragments can be achieved in a number of ways, 
including, but not limited to: 

1 . Chemical synthesis of the entire fragment. 

2. Synthesize two oligonucleotides that are complementary at their 3' ends, anneal them, 
and use DNA polymerase Klenow fragment, or equivalent, to extend, giving a double-stranded 
fragment. 

3. Synthesize a number of smaller oligonucleotides, kinase those oligo's that have 
internal 5" ends, anneal all oligo's and ligate, viz. 
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^ P P 3^ 

3' P p 5^ 

Techniques 2 and 3 can be used in subsequent steps to join smaller fragments to each 
other. PCR can be used to increase the quantity of material for cloning, but it may lead to an 
increase in the number of mutations. If an error-free fragment is not obtained, then site-directed 
mutagenesis can be used to correct the best isolate. This is followed by concatenation of error- 
free fragments and sequencing of junctions to confirm their precision. 

Use 

The synthetic nucleic acid sequences of the invention are useful for expressing a protein 
normally expressed in a mammalian cell, or in cell culture (e.g. for commercial production of 
human proteins such as GH, tPA, GLP-1, EPO, ot-galactosidase, P-glucoceramidase, a- 
iduronidase; oc-L-iduronidase, glucosamine-N-sulfatase, alpha-N-acetylglucosaminidase, 
acetylcoenzyme A:a-glucosmainide~N-acetyltransferase, N-acetylglucosamine-6-sulfatase, N- 
acetylglucosamine-6-sulfatase, p-galactosidase, N-acetylgalactosamine-6-sulfatase, p- 
glucuronidase. Factor VIII, and Factor IX). The synthetic nucleic acid sequences of the 
invention are also useful for gene therapy. For example, a synthetic nucleic acid sequence 
encoding a selected protein can be introduced directly, e.g., via non- viral cell transfection or via 
a vector in to a cell, e.g., a transformed or a non-transformed cell, which can express the protein 
to create a cell which can be administered to a patient in need of the protein. Such cell-based 
gene therapy techniques are described in greater detail in co-pending US applications: USSN 
08/334,797; USSN 08/231,439; USSN 08/334,455; and USSN 08/928,881, which are hereby 
expressly incorporated by reference in their entirety. 
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Examples 


Construction of pXF8.61 

The fourteen gene fragments of the B-domain-deleted-FVIII optimized cDNA listed in 
Table 2 and shown in Figure 5 (Fragment A-Fragment N) were made as follows. 92 
oligonucleotides were made by oligonucleotide synthesis on an ABI 391 synthesizer (Perkin 
Elmer). The 92 oligonucleotides are listed in Table 3. Figure 5 shows how these 92 
oligonucleotides anneal to form the fourteen gene fragments of Table 2. For each strand of each 
gene fragment, the first oligonucleotide (i.e. the most 5 X ) was manufactured with a 5 x -hydroxyl 
terminus, and the subsequent oligonucleotides were manufactured as 5'-phosphorylated to allow 
the ligation of adjacent annealed oligonucleotides. For gene fragments A3?C,F,G,J ? K,L ? M and 
N, six oligonucleotides were annealed, ligated, digested with EcoRI and Hindlll and cloned into 
pUC18 digested with EcoRI and Hindlll. For gene fragments D, E, H and I, eight 
oligonucleotides were annealed, ligated, digested with EcoRI and Hindlll and cloned into pUC18 
digested with EcoRI and Hindlll. This procedure generated fourteen different plasmids— 
pAMl A through pAMlN. 

Table 2 


Fragment 

5' end 

3' end 

Note 

A 

Nhel 

1 

Apal 

279 


B 

Apal 

279 

Pmll 

544 


C 

Pmll 

544 

Pmll 

829 


D 

Pmll 

829 

BglII(/BamHI) 

1172 

BamHI site 3' to seq 

E 

(Bgin/)Bam 
HI 

1172 

Bgin 

1583 


F 

Bgin 

1583 

Kpnl 

1817 


G 

Kpnl 

1817 

BamHI 

2126 


H 

BamHI 

2126 

Pmll 

2491 


I 

Pmll 

2491 

Kpnl 

3170 

ABstEII 2661-2955 
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J 

BstEH 

2661 

BstEH 

2955 


K 

Kpnl 

3170 

Apal 

3482 


L 

Apal 

3482 

SmaI(/EcoRV) 

3772 


M 

(SmaI/)EcoR 
V 

3772 

BstEH 

4062 


N 

BstEH 

4062 

Smal 

4348 



In Table 2 the restriction site positions are numbered by the first base of the palindrome; 
numbering begins at the Nhel site. 


Table 3 


Oligo' 
Name 

Oligo' 
Length 

Oligonucleotide Sequence 

AMlAfl 

118 

GTAGAATTCGTAGGCTAGCATGCAGATCGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCT 
GCGCTTCTGCTTCAGCGCCACCCGCCGCTACTACCTGGGCGCCGTGGAGCTGAGCTGG (SEQ 
ID NO:7) 

AMlAf2 

104 

GACTACATGCAGAGCGACCTGGGCGAGCTGCCCGTGGACGCCCGCTTCCCCCCCCGCGTG 
CCCAAGAGCTTCCCCTTCAACACCAGCGTGGTGTACAAGAAGAC (SEQ ID NO: 8) 

AM1AB 

88 

CCTGTTCGTGGAGTTCACCGACCACCTGTTCAACATCGCCAAGCCCCGCCCCCCCTGGAT 
GGGCCTGCTGGGCCCCTACAAGCTTTAC (SEQ ID NO: 9) 

AMIArl 

119 

GTAAAGCTTGTAGGGGCCCAGCAGGCCCATCCAGGGGGGGCGGGGCTTGGCGATGTTGA 
ACAGGTGGTCGGTGAACTCCACGAACAGGGTCTTCTTGTACACCACGCTGGTGTTGAAGG 
(SEQ ID NO: 10) 

AM1Ar2 

107 

GGAAGCTCTTGGGCACGCGGGGGGGGAAGCGGGCGTCCACGGGCAGCTCGCCCAGGTCG 
CTCTGCATGTAGTCCCAGCTCAGCTCCACGGCGCCCAGGTAGTAGCGG (SEQ ID NO: 11) 

AM1Ar3 

84 

CGGGTGGCGCTGAAGCAGAAGCGCAGCAGGCACAGGAAGAAGCAGGTGCTCAGCTCGAT 
CTGCATGCTAGCCTACGAATTCTAC (SEQ ID NO: 12) 

AMIBfl 

115 

GTAGAATTCGTAGGGGCCCCACCATCCAGGCCGAGGTGTACGACACCGTGGTGATCACCC 
TGAAGAACATGGCCAGCCACCCCGTGAGCCTGCACGCCGTGGGCGTGAGCTACTG (SEQ ID 
NO: 13) 

AMlBf2 

103 

GAAGGCCAGCGAGGGCGCCGAGTACGACGACCAGACCAGCCAGCGCGAGAAGGAGGAC 
GACAAGGTGTTCCCCGGCGGCAGCCACACCTACGTGTGGCAGGTG (SEQ ID NO: 14) 

AM1BB 

79 

CTGAAGGAGAACGGCCCCATGGCCAGCGACCCCCTGTGCCTGACCTACAGCTACCTGAGC 
CACGTGCTACAAGCTTTAC (SEQ ID NO: 15) 

AMlBrl 

107 

GTAAAGCTTGTAGCACGTGGCTCAGGTAGCTGTAGGTCAGGCACAGGGGGTCGCTGGCC 
ATGGGGCCGTTCTCCTTCAGCACCTGCCACACGTAGGTGTGGCTGCCG(SEQ ID NO: 16) 

AMlBr2 

101 

CCGGGGAACACCTTGTCGTCCTCCTTCTCGCGCTGGCTGGTCTGGTCGTCGTACTCGGCGC 
CCTCGCTGGCCTTCCAGTAGCTCACGCCCACGGCGTGCAG (SEQ ID NO: 17) 

AMlBr3 

89 

GCTCACGGGGTGGCTGGCCATGTTCTTCAGGGTGATCACCACGGTGTCGTACACCTCGGC 
CTGGATGGTGGGGCCCCTACGAATTCTAC (SEQ ID NO: 18) 

AMICfl 

122 

GTAGAATTCGTAGCCACGTGGACCTGGTGAAGGACCTGAACAGCGGCCTGATCGGCGCC 
CTGCTGGTGTGCCGCGAGGGCAGCCTGGCCAAGGAGAAGACCCAGACCCTGCACAAGTTC 
ATC (SEQ ID NO: 19) 

AMlCf2 

110 

CTGCTGTTCGCCGTGTTCGACGAGGGCAAGAGCTGGCACAGCGAGACCAAGAACAGCCT 
GATGCAGGACCGCGACGCCGCCAGCGCCCGCGCCTGGCCCAAGATGCACAC (SEQ ID NO: 
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20) 

AMICfi 

86 

CGTGAACGGCTACGTGAACCGCAGCCTGCCCGGCCTGATCGGCTGCCACCGCAAGAGCG 
TGTACTGGCACGTGCTACAAGCTTTAC (SEQ ID NO: 21) 

AMICrl 

108 

GTAAAGCTTGTAGCACGTGCCAGTACACGCTCTTGCGGTGGCAGCCGATCAGGCCGGGCA 
GGCTGCGGTTCACGTAGCCGTTCACGGTGTGCATCTTGGGCCAGGCGC (SEQ ID NO: 22) 

AM1Q2 

110 

GGGCGCTGGCGGCGTCGCGGTCCTGCATCAGGCTGTTCTTGGTCTCGCTGTGCCAGCTCTT 
GCCCTCGTCGAACACGGCGAACAGCAGGATGAACTTGTGCAGGGTCTGG (SEQ ID NO: 23) 

AM1Q3 

100 

GTCTTCTCCTTGGCCAGGCTGCCCTCGCGGCACACCAGCAGGGCGCCGATCAGGCCGCTG 
TTCAGGTCCTTCACCAGGTCCACGTGGCTACGAATTCTAC (SEQ ID NO: 24) 

AMIDfl 

99 

GTAGAATTCGTAGCACGTGATCGGCATGGGCACCACCCCCGAGGTGCACAGCATCTTCCT 
GGAGGGCCACACCTTCCTGGTGCGCAACCACCGCCAGGC (SEQ ID NO: 25) 

AMlDf2 

100 

CAGCCTGGAGATCAGCCCCATCACCTTCCTGACCGCCCAGACCCTGCTGATGGACCTGGG 
CCAGTTCCTGCTGTTCTGCCACATCAGCAGCCACCAGCAC (SEQ ID NO: 26) 

AMlDf3 

101 

GACGGCATGGAGGCCTACGTGAAGGTGGACAGCTGCCCCGAGGAGCCCCAGCTGCGCAT 
GAAGAACAACGAGGAGGCCGAGGACTACGACGACGACCTGAC (SEQ ID NO: 27) 

AMlDf4 

84 

CGACAGCGAGATGGACGTGGTGCGCTTCGACGACGACAACAGCCCCAGCTTCATCCAGA 
TCTCTACGGATCCTACAAGCTTTAC (SEQ ID NO: 28) 

AMIDrl 

109 

GTAAAGCTTGTAGGATCCGTAGAGATCTGGATGAAGCTGGGGCTGTTGTCGTCGTCGAAG 
CGCACCACGTCCATCTCGCTGTCGGTCAGGTCGTCGTCGTAGTCCTCGG (SEQ ED NO: 29) 

AMlDr2 

101 

CCTCCTCGTTGTTCTTCATGCGCAGCTGGGGCTCCTCGGGGCAGCTGTCCACCTTCACGTA 
GGCCTCCATGCCGTCGTGCTGGTGGCTGCTGATGTGGCAG (SEQ ID NO: 30) 

AMlDr3 

102 

AACAGCAGGAACTGGCCCAGGTCCATCAGCAGGGTCTGGGCGGTCAGGAAGGTGATGGG 
GCTGATCTCCAGGCTGGCCTGGCGGTGGTTGCGCACCAGGAAG (SEQ ID NO: 31) 

AMlDr4 

72 

GTGTGGCCCTCCAGGAAGATGCTGTGCACCTCGGGGGTGGTGCCCATGCCGATCACGTGC 
TACGAATTCTAC (SEQ ID NO: 32) 

AMIEfl 

122 

GTAGAATTCGTAGGGATCCGCAGCGTGGCCAAGAAGCACCCCAAGACCTGGGTGCACTA 
CATCGCCGCCGAGGAGGAGGACTGGGACTACGCCCCCCTGGTGCTGGCCCCCGACGACCGC 
AG (SEQ ID NO: 33) 

AMlEf2 

120 

CTACAAGAGCCAGTACCTGAACAACGGCCCCCAGCGCATCGGCCGCAAGTACAAGAAGG 
TGCGCTTCATGGCCTACACCGACGAGACCTTCAAGACCCGCGAGGCCATCCAGCACGAGAG 
(SEQ ID NO: 34) 

AM1EO 

115 

CGGCATCCTGGGCCCCCTGCTGTACGGCGAGGTGGGCGACACCCTGCTGATCATCTTCAA 
GAACCAGGCCAGCCGCCCCTACAACATCTACCCCCACGGCATCACCGACGTGCGC (SEQ ID 
NO: 35) 

AMlEf4 

86 

CCCCTGTACAGCCGCCGCCTGCCCAAGGGCGTGAAGCACCTGAAGGACTTCCCCATCCTG 
CCCGGCGAGATCTCTACAAGCTTTAC (SEQ ID NO: 36) 

AMIErl 

109 

GTAAAGCTTGTAGAGATCTCGCCGGGCAGGATGGGGAAGTCCTTCAGGTGCTTCACGCCC 
TTGGGCAGGCGGCGGCTGTACAGGGGGCGCACGTCGGTGATGCCGTGGG (SEQ ID NO: 37) 

AMlEr2 

114 

GGTAGATGTTGTAGGGGCGGCTGGCCTGGTTCTTGAAGATGATCAGCAGGGTGTCGCCCA 
CCTCGCCGTACAGCAGGGGGCCCAGGATGCCGCTCTCGTGCTGGATGGCCTCGC (SEQ ID 
NO: 38) 

AMlEr3 

121 

GGGTCTTGAAGGTCTCGTCGGTGTAGGCCATGAAGCGCACCTTCTTGTACTTGCGGCCGA 
TGCGCTGGGGGCCGTTGTTCAGGTACTGGCTCTTGTAGCTGCGGTCGTCGGGGGCCAGCAC 
(SEQ ID NO: 39) 

AMlEr4 

99 

CAGGGGGGCGTAGTCCCAGTCCTCCTCCTCGGCGGCGATGTAGTGCACCCAGGTCTTGGG 
GTGCTTCTTGGCCACGCTGCGGATCCCTACGAATTCTAC (SEQ ED NO: 40) 

AMlFfl 

102 

GTAGAATTCGTAGAGATCTTCAAGTACAAGTGGACCGTGACCGTGGAGGACGGCCCCAC 
CAAGAGCGACCCCCGCTGCCTGACCCGCTACTACAGCAGCTTC (SEQ ID NO: 41) 

AMlFf2 

103 

GTGAACATGGAGCGCGACCTGGCCAGCGGCCTGATCGGCCCCCTGCTGATCTGCTACAAG 
GAGAGCGTGGACCAGCGCGGCAACCAGATCATGAGCGACAAGC (SEQ ID NO: 42) 

AM1FB 

61 

GCAACGTGATCCTGTTCAGCGTGTTCGACGAGAACCGCAGCTGGTACCCTACAAGCTTTA 
C(SEQ ID NO: 43) 

AMIFrl 

87 

GTAAAGCTTGTAGGGTACCAGCTGCGGTTCTCGTCGAACACGCTGAACAGGATCACGTTG 
CGCTTGTCGCTCATGATCTGGTTGCCG (SEQ ID NO: 44) 
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AM1Fi2 

101 

CGCTGGTCCACGCTCTCCTTGTAGCAGATCAGCAGGGGGCCGATCAGGCCGCTGGCCAGG 
TCGCGCTCCATGTTCACGAAGCTGCTGTAGTAGCGGGTCAG (SEQ ID NO: 45) 

AMlFr3 

78 

GCAGCGGGGGTCGCTCTTGGTGGGGCCGTCCTCCACGGTCACGGTCCACTTGTACTTGAA 
GATCTCTACGAATTCTAC (SEQ ID NO: 46) 

AMIGfl 

120 

GTAGAATTCGTAGGGTACCTGACCGAGAACATCCAGCGCTTCCTGCCCAACCCCGCCGGC 
GTGCAGCTGGAGGACCCCGAGTTCCAGGCCAGCAACATCATGCACAGCATCAACGGCTAC 
(SEQ ID NO: 47) 

AMlGf2 

126 

GTGTTCGACAGCCTGCAGCTGAGCGTGTGCCTGCACGAGGTGGCCTACTGGTACATCCTG 
AGCATCGGCGCCCAGACCGACTTCCTGAGCGTGTTCTTCAGCGGCTACACCTTCAAGCACA 
AGATG (SEQ ID NO: 48) 

AM1GB 

95 

GTGTACGAGGACACCCTGACCCTGTTCCCCTTCAGCGGCGAGACCGTGTTCATGAGCATG 
GAGAACCCCGGCCTGTGGATCCCTACAAGCTTTAC (SEQ ID NO: 49) 

AMlGrl 

119 

GTAAAGCTTGTAGGGATCCACAGGCCGGGGTTCTCCATGCTCATGAACACGGTCTCGCCG 
CTGAAGGGGAACAGGGTCAGGGTGTCCTCGTACACCATCTTGTGCTTGAAGGTGTAGCC 
(SEQ ID NO: 50) 

AMlGr2 

124 

GCTGAAGAACACGCTCAGGAAGTCGGTCTGGGCGCCGATGCTCAGGATGTACCAGTAGG 
CCACCTCGTGCAGGCACACGCTCAGCTGCAGGCTGTCGAACACGTAGCCGTTGATGCTGTG 
CATG(SEQIDNO:51) 

AMlGr3 

98 

ATGTTGCTGGCCTGGAACTCGGGGTCCTCCAGCTGCACGCCGGCGGGGTTGGGCAGGAA 
GCGCTGGATGTTCTCGGTCAGGTACCCTACGAATTCTAC (SEQ ID NO: 52) 

AMIHfl 

111 

GTAGAATTCGTAGGGATCCTGGGCTGCCACAACAGCGACTTCCGCAACCGCGGCATGACC 
GCCCTGCTGAAGGTGAGCAGCTGCGACAAGAACACCGGCGACTACTACGAG (SEQ ID NO: 
53) 

AMlHf2 

102 

GACAGCTACGAGGACATCAGCGCCTACCTGCTGAGCAAGAACAACGCCATCGAGCCCCG 
CCTGGAGGAGATCACCCGCACCACCCTGCAGAGCGACCAGGAG (SEQ ID NO: 54) 

AMJHG 

105 

GAGATCGACTACGACGACACCATCAGCGTGGAGATGAAGAAGGAGGACTTCGACATCTA 
CGACGAGGACGAGAACCAGAGCCCCCGCAGCTTCCAGAAGAAGACC (SEQ ID NO: 55) 

AMlHf4 

79 

CGCCACTACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAGCAGCAGCCCC 
CACGTGCTACAAGCTTTAC (SEQ ID NO: 56) 

AMIHrl 

101 

GTAAAGCTTGTAGCACGTGGGGGCTGCTGCTCATGCCGTAGTCCCACAGGCGCTCCACGG 
CGGCGATGAAGTAGTGGCGGGTCTTCTTCTGGAAGCTGCGG (SEQ ID NO: 57) 

AMlHr2 

105 

GGGCTCTGGTTCTCGTCCTCGTCGTAGATGTCGAAGTCCTCCTTCTTCATCTCCACGCTGA 
TGGTGTCGTCGTAGTCGATCTCCTCCTGGTCGCTCTGCAGGGTG (SEQ ID NO: 58) 

AMlHr3 

108 

GTGCGGGTGATCTCCTCCAGGCGGGGCTCGATGGCGTTGTTCTTGCTCAGCAGGTAGGCG 
CTGATGTCCTCGTAGCTGTCCTCGTAGTAGTCGCCGGTGTTCTTGTCG (SEQ ID NO: 59) 

AMlHr4 

83 

CAGCTGCTCACCTTCAGCAGGGCGGTCATGCCGCGGTTGCGGAAGTCGCTGTTGTGGCAG 
CCCAGGATCCCTACGAATTCTAC (SEQ ID NO: 60) 

AMIIfl 

115 

GTAGAATTCGTAGCACGTGCTGCGCAACCGCGCCCAGAGCGGCAGCGTGCCCCAGTTCA 
AGAAGGTGGTGTTCCAGGAGTTCACCGACGGCAGCTTCACCCAGCCCCTGTACCGC(SEQ 
ID NO: 61) 

AMlIf2 

111 

GGCGAGCTGAACGAGCACCTGGGCCTGCTGGGCCCCTACATCCGCGCCGAGGTGGAGGA 
CAACATCATGGTGACCGTGCAGGAGTTCGCCCTGTTCTTCACCATCTTCGAC (SEQ ID NO: 
62) 

AM1IG 

106 

GAGACCAAGAGCTGGTACTTCACCGAGAACATGGAGCGCAACTGCCGCGCCCCCTGCAA 
CATCCAGATGGAGGACCCCACCTTCAAGGAGAACTACCGCTTCCACG (SEQ ID NO: 63) 

AMlIf4 

85 

CCATCAACGGCTACATCATGGACACCCTGCCCGGCCTGGTGATGGCCCAGGACCAGCGCA 
TCCGCTGGTACCCTACAAGCTTTAC (SEQ ID NO: 64) 

AMIIrl 

115 

GTAAAGCTTGTAGGGTACCAGCGGATGCGCTGGTCCTGGGCCATCACCAGGCCGGGCAG 
GGTGTCCATGATGTAGCCGTTGATGGCGTGGAAGCGGTAGTTCTCCTTGAAGGTGG (SEQ ID 
NO: 65) 

AMlIr2 

99 

GGTCCTCCATCTGGATGTTGCAGGGGGCGCGGCAGTTGCGCTCCATGTTCTCGGTGAAGT 
ACCAGCTCTTGGTCTCGTCGAAGATGGTGAAGAACAGGG (SEQ ID NO: 66) 

AMlIr3 

110 

CGAACTCCTGCACGGTCACCATGATGTTGTCCTCCACCTCGGCGCGGATGTAGGGGCCCA 
GCAGGCCCAGGTGCTCGTTCAGCTCGCCGCGGTACAGGGGCTGGGTGAAG (SEQ ID NO: 67) 
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AMlIr4 

93 

CTGCCGTCGGTGAACTCCTGGAACACCACCTTCTTGAACTGGGGCACGCTGCCGCTCTGG 
GCGCGGTTGCGCAGCACGTGCTACGAATTCTAC (SEQ ID NO: 68) 

AMUfl 

116 

GTAGAATTCGTAGGGTGACCTTCCGCAACCAGGCCAGCCGCCCCTACAGCTTCTACAGCA 
GCCTGATCAGCTACGAGGAGGACCAGCGCCAGGGCGCCGAGCCCCGCAAGAACTTC (SEQ 
ID NO: 69) 

AMUf2 

120 

GTGAAGCCCAACGAGACCAAGACCTACTTCTGGAAGGTGCAGCACCACATGGCCCCCAC 
CAAGGACGAGTTCGACTGCAAGGCCTGGGCCTACTTCAGCGACGTGGACCTGGAGAAGGA 
C(SEQ ID NO: 70) 

AMIJfi 

91 

GTGCACAGCGGCCTGATCGGCCCCCTGCTGGTGTGCCACACCAACACCCTGAACCCCGCC 
CACGGCCGCCAGGTGACCCTACAAGCTTTAC (SEQ ID NO: 71) 

AMIJrl 

113 

GTAAAGCTTGTAGGGTCACCTGGCGGCCGTGGGCGGGGTTCAGGGTGTTGGTGTGGCACA 
CCAGCAGGGGGCCGATCAGGCCGCTGTGCACGTCCTTCTCCAGGTCCACGTCG (SEQ ID NO: 
72) 

AMUr2 

121 

CTGAAGTAGGCCCAGGCCTTGCAGTCGAACTCGTCCTTGGTGGGGGCCATGTGGTGCTGC 
ACCTTCCAGAAGTAGGTCTTGGTCTCGTTGGGCTTCACGAAGTTCTTGCGGGGCTCGGCGC 
(SEQ ID NO: 73) 

AMUr3 

93 

CCTGGCGCTGGTCCTCCTCGTAGCTGATCAGGCTGCTGTAGAAGCTGTAGGGGCGGCTGG 
CCTGGTTGCGGAAGGTCACCCTACGAATTCTAC (SEQ ID NO: 74) 

AMIKfl 

120 

GTAGAATTCGTAGGGTACCTGCTGAGCATGGGCAGCAACGAGAACATCCACAGCATCCA 
CTTCAGCGGCCACGTGTTCACCGTGCGCAAGAAGGAGGAGTACAAGATGGCCCTGTACAAC 
(SEQ ID NO: 75) 

AMlKf2 

122 

CTGTACCCCGGCGTGTTCGAGACCGTGGAGATGCTGCCCAGCAAGGCCGGCATCTGGCGC 
GTGGAGTGCCTGATCGGCGAGCACCTGCACGCCGGCATGAGCACCCTGTTCCTGGTGTACA 
G (SEQ ID NO: 76) 

AM 1KB 

102 

CAACAAGTGCCAGACCCCCCTGGGCATGGCCAGCGGCCACATCCGCGACTTCCAGATCAC 
CGCCAGCGGCCAGTACGGCCAGTGGGCCCCTACAAGCTTTAC (SEQ ID NO: 77) 

AMlKrl 

123 

GTAAAGCTTGTAGGGGCCCACTGGCCGTACTGGCCGCTGGCGGTGATCTGGAAGTCGCGG 
ATGTGGCCGCTGGCCATGCCCAGGGGGGTCTGGCACTTGTTGCTGTACACCAGGAACAGGG 
TG (SEQ ID NO: 78) 

AMlKr2 

125 

CTCATGCCGGCGTGCAGGTGCTCGCCGATCAGGCACTCCACGCGCCAGATGCCGGCCTTG 
CTGGGCAGCATCTCCACGGTCTCGAACACGCCGGGGTACAGGTTGTACAGGGCCATCTTGT 
ACTC (SEQ ID NO: 79) 

AMlKr3 

96 

CTCCTTCTTGCGCACGGTGAACACGTGGCCGCTGAAGTGGATGCTGTGGATGTTCTCGTT 
GCTGCCCATGCTCAGCAGGTACCCTACGAATTCTAC (SEQ ID NO: 80) 

AMILfl 

120 

GTAGAATTCGTAGGGGCCCCCAAGCTGGCCCGCCTGCACTACAGCGGCAGCATCAACGC 
CTGGAGCACCAAGGAGCCCTTCAGCTGGATCAAGGTGGACCTGCTGGCCCCCATGATCATC 
(SEQ ID NO: 81) 

AMlLf2 

116 

CACGGCATCAAGACCCAGGGCGCCCGCCAGAAGTTCAGCAGCCTGTACATCAGCCAGTT 
CATCATCATGTACAGCCTGGACGGCAAGAAGTGGCAGACCTACCGCGGCAACAGCAC (SEQ 
ID NO: 82) 

AM1LO 

86 

CGGCACCCTGATGGTGTTCTTCGGCAACGTGGACAGCAGCGGCATCAAGCACAACATCTT 
CAACCCCCCCGGGCTACAAGCTTTAC (SEQ ID NO: 83) 

AMILrl 

110 

GTAAAGCTTGTAGCCCGGGGGGGTTGAAGATGTTGTGCTTGATGCCGCTGCTGTCCACGT 
TGCCGAAGAACACCATCAGGGTGCCGGTGCTGTTGCCGCGGTAGGTCTGC (SEQ ID NO: 84) 

AMlLr2 

113 

CACTTCTTGCCGTCCAGGCTGTACATGATGATGAACTGGCTGATGTACAGGCTGCTGAAC 
TTCTGGCGGGCGCCCTGGGTCTTGATGCCGTGGATGATCATGGGGGCCAGCAG (SEQ ID NO: 
85) 

AMlLr3 

99 

GTCCACCTTGATCCAGCTGAAGGGCTCCTTGGTGCTCCAGGCGTTGATGCTGCCGCTGTA 
GTGCAGGCGGGCCAGCTTGGGGGCCCCTACGAATTCTAC (SEQ ID NO: 86) 

AMIMfl 

122 

GTAGAATTCGTAGGATATCATCGCCCGCTACATCCGCCTGCACCCCACCCACTACAGCAT 
CCGCAGCACCCTGCGCATGGAGCTGATGGGCTGCGACCTGAACAGCTGCAGCATGCCCCTG 
G (SEQ ID NO: 87) 

AMlMf2 

112 

GCATGGAGAGCAAGGCCATCAGCGACGCCCAGATCACCGCCAGCAGCTACTTCACCAAC 
ATGTTCGCCACCTGGAGCCCCAGCAAGGCCCGCCTGCACCTGCAGGGCCGCAG (SEQ ID 
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NO: 88) 

AMIMfi 

89 

CAACGCCTGGCGCCCCCAGGTGAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGA 
AGACCATGAAGGTGACCCTACAAGCTTTAC (SEQ ID NO: 89) 

AMIMrl 

112 

GTAAAGCTTGTAGGGTCACCTTCATGGTCTTCTGGAAGTCCACCTGCAGCCACTCCTTGG 
GGTTGTTCACCTGGGGGCGCCAGGCGTTGCTGCGGCCCTGCAGGTGCAGGCG (SEQ ID NO: 
90) 

AMlMr2 

114 

GGCCTTGCTGGGGCTCCAGGTGGCGAACATGTTGGTGAAGTAGCTGCTGGCGGTGATCTG 
GGCGTCGCTGATGGCCTTGCTCTCCATGCCCAGGGGCATGCTGCAGCTGTTCAG (SEQ ID 
NO: 91) 

AMlMr3 

97 

GTCGCAGCCCATCAGCTCCATGCGCAGGGTGCTGCGGATGCTGTAGTGGGTGGGGTGCAG 
GCGGATGTAGCGGGCGATGATATCCTACGAATTCTAC (SEQ ID NO: 92) 

AMINfl 

122 

GTAGAATTCGTAGGGTGACCGGCGTGACCACCCAGGGCGTGAAGAGCCTGCTGACCAGC 
ATGTACGTGAAGGAGTTCCTGATCAGCAGCAGCCAGGACGGCCACCAGTGGACCCTGTTCT 
TC (SEQ ID NO: 93) 

AM1NG 

104 

CAGAACGGCAAGGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCCGTGGTGAA 
CAGCCTGGACCCCCCCCTGCTGACCCGCTACCTGCGCATCCACCC (SEQ ID NO: 94) 

AMlNf3 

92 

CCAGAGCTGGGTGCACCAGATCGCCCTGCGCATGGAGGTGCTGGGCTGCGAGGCCCAGG 
ACCTGTACTAGCTGCCCGGGCTACAAGCTTTAC (SEQ ID NO: 95) 

AMlNrl 

118 

GTAAAGCTTGTAGCCCGGGCAGCTAGTACAGGTCCTGGGCCTCGCAGCCCAGCACCTCCA 
TGCGCAGGGCGATCTGGTGCACCCAGCTCTGGGGGTGGATGCGCAGGTAGCGGGTCAG 
(SEQ ID NO: 96) 

AMlNr2 

100 

CAGGGGGGGGTCCAGGCTGTTCACCACGGGGGTGAAGCTGTCCTGGTTGCCCTGGAACA 
CCTTCACCTTGCCGTTCTGGAAGAACAGGGTCCACTGGTGG (SEQ ID NO: 97) 

AMlNr3 

100 

CCGTCCTGGCTGCTGCTGATCAGGAACTCCTTCACGTACATGCTGGTCAGCAGGCTCTTCA 
CGCCCTGGGTGGTCACGCCGGTCACCCTACGAATTCTAC (SEQ ID NO: 98) 


As noted in Table 2 and shown in Figure 5, fragment D was constructed with a BamHI 
restriction site placed between the Bglll site and the Hindlll site at the 3' end of the fragment. 
Fragment I was constructed to carry the DNA from Pmll (2491) to BstEII (2661) followed 
immediately by the DNA from BstEII (2955) to Kpnl (3170), so that the insertion of the BstEII 
fragment from pAMJ into the BstEII site of p AMI in the correct orientation will generate the 
desired sequences from 2491 to 3170. Plasmid pAMlB was digested with Apal and Hindlll and 
the insert was purified by agarose gel electrophoresis and inserted into plasmid pAMl A digested 
with Apal and Hindlll, generating plasmid pAMl AB. Plasmid pAMID was digested with Pmll 
and Hindlll and the insert was purified by agarose gel electrophoresis and inserted into plasmid 
pAMlAB digested with Pmll and Hindlll, generating plasmid pAMlABD. Plasmid pAMIC 
was digested with Pmll and the insert was purified by agarose gel electrophoresis and inserted 
into plasmid pAMl ABD digested with Pmll, generating plasmid pAMl ABCD, insert orientation 
was confirmed by the appearance of a diagnostic 1 1 lbp fragment when digested with MscL 
Plasmid pAMlF was digested with Bglll and Hindlll and the insert was purified by agarose gel 
electrophoresis and inserted into plasmid pAMIE digested with Bglll and Hindlll, generating 


plasmid pAMlEF. Plasmid pAMlG was digested with Kpnl and Hindlll and the insert was 
purified by agarose gel electrophoresis and inserted into plasmid pAMlEF digested with Kpnl 
and Hindlll, generating plasmid pAMlEFG. Plasmid pAMU was digested with BstEII and the 
insert was purified by agarose gel electrophoresis and inserted into plasmid pAMU digested with 
BstEII, generating plasmid pAMHJ; orientation was confirmed by the appearance of a diagnostic 
465bp fragment when digested with EcoRI and EagL Plasmid pAMHJ was digested with Pmll 
and Hindlll and the insert was purified by agarose gel electrophoresis and inserted into plasmid 
pAMlH digested with Pmll and Hindlll, generating plasmid pAMlHIJ. Plasmid pAMlM was 
digested with EcoRI and BstEII and the insert was purified by agarose gel electrophoresis and 
inserted into plasmid pAMIN digested with EcoRI and BstEII, generating plasmid pAMlMN. 
Plasmid pAMIL was digested with EcoRI and Smal and the insert was purified by agarose gel 
electrophoresis and inserted into plasmid pAMIMN digested with EcoRI and EcoRV, generating 
plasmid pAMlLMN. Plasmid pAMlLMN was digested with Apal and Hindlll and the insert 
was purified by agarose gel electrophoresis and inserted into plasmid pAMlK digested with 
Apal and Hindlll, generating plasmid pAMlKLMN. Plasmid pAMlEFG was digested with 
BamHI and the insert was purified by agarose gel electrophoresis and inserted into plasmid 
pAMl ABCD digested with BamHI and Bglll, generating plasmid pAMl ABCDEFG; orientation 
was confirmed by the appearance of a diagnostic 552bp fragment when digested with Bglll and 
Hindlll. Plasmid pAMlKLMN was digested with Kpnl and Hindlll and the insert was purified 
by agarose gel electrophoresis and inserted into plasmid pAMl HI J digested with Kpnl and 
Hindlll, generating plasmid pAMlHIJKLMN. Plasmid pAMlHIJKLMN was digested with 
BamHI and Hindlll and the insert was purified by agarose gel electrophoresis and inserted into 
plasmid pAMl ABCDEFG digested with BamHI and Hindlll, generating plasmid pAMl-1. 
These cloning steps are depicted in Figure 6. Figure 7 shows the DNA sequence of the insert 
contained in pAMl-1 (SEQ ID NO:l). This insert can be cloned into any suitable expression 
vector as a Nhel-Smal fragment to generate an expression construct. pXF8.61 (Fig. 4), pXF8.38 
(Fig. 11) and pXF8.224 (Fig. 13) are examples of such a construct. 

Construction of pXF8.186 

The "LE" version of the B-domain-deleted-FVIII optimized cDNA contained in pAMl -1 was 
modified by replacing the Leu-Glu dipeptide (2284-2289) at the junction of the heavy and light 
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chains with four Arginine residues, making a total of five consecutive Arginine residues (SEQ 
ID NO:2). This was achieved as follows. The six oligonucleotides shown in Table 4 were 
annealed, ligated, digested with EcoRI and Hindlll and cloned into pUC18 digested with EcoRI 
and Hindlll, generating the plasmid pAM8B. Figure 8 shows how these oligonucleotides anneal 
to form the requisite DNA sequence. pAM8B was digested with BamHI and BstXI and the 
230bp insert was purified by agarose gel electrophoresis and used to replace the BamHI(2126)- 
BstXI(2352) fragment of the "LE" version (See Figure 7). Figure 9 shows the sequence of the 
resulting cDNA (SEQ ID NO:2). This "5Arg" version of the B-domain-deleted-FVIII optimized 
cDNA can be cloned into any suitable expression vector as a Nhel-Smal fragment to generate 
anexpression construct. pXF8.186 (Figure 3) is an example of such a construct. 

Table 4 


OLIGO 1 
NAME 

OLIGO' 
LENGTH 

OLIGONUCLEOTIDE SEQUENCE 

AM8F1 

140 

GTAGAATTCGGATCCTGGGCTGCCACAACAGCGACTT 
CCGCAACCGCGGCATGACCGCCCTGCTGAAGGTGAGC 
AGCTGCGACAAGAACACCGGCGACTACTACGAGGAC 
AGCTACGAGGACATCAGCGCCTACCTGCTG (SEQ ID 
NO:99) 

AM8BF2 

57 

AGCAAGAACAACGCCATCGAGCCCCGCAGGCGCAGG 
CGCGAGATCACCCGCACCACC (SEQ ID NO: 100) 

AM8F4 

58 

CTGCAGAGCGACCAGGAGGAGATCGACTACGACGAC 
ACCATCAGCGTGGAAGCTTTAC (SEQ ID NO:101) 

AM8R1 

79 

GTAAAGCTTCCACGCTGATGGTGTCGTCGTAGTCGAT 
CTCCTCCTGGTCGCTCTGCAGGGTGGTGCGGGTGATCT 
CGCG (SEQIDNO:102) 

AM8BR2 

57 

CCTGCGCCTGCGGGGCTCGATGGCGTTGTTCTTGCTCA 
GCAGGTAGGCGCTGATGTC (SEQ ID NO: 103) 

AM8BR4 

119 

CTCGTAGCTGTCCTCGTAGTAGTCGCCGGTGTTCTTGT 
CGCAGCTGCTCACCTTCAGCAGGGCGGTCATGCCGCG 
GTTGCGGAAGTCGCTGTTGTGGCAGCCCAGGATCCGA 
ATTCTAC (SEQ ID NO: 104) 
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Construction of pXF8.36 

The construct for expression of human Factor VIII, pXF8.36 (Fig. 10) is an 11.1 kilobase 
circular DNA plasmid which contains the following elements: A cytomegalovirus immediate 
early I gene (CMV) 5' flanking region comprised of a promoter sequence, a 5' untranslated 
sequence (5'UTS) and first intron sequence for initiation of transcription of the Factor VIII 
cDNA. The CMV region is next fused with a wild-type B domain-deleted Factor VIII cDNA 
sequence. The Factor VIII cDNA sequence is fused, at the 3' end, with a 0.3 kb fragment of the 
human growth hormone 3' untranslated sequence. A transcription termination signal and 3' 
untranslated sequence (3' UTS) of the human growth hormone gene is used to ensure processing 
of the message immediately following the stop codon. A selectable marker gene (the bacterial 
neomycin phosphotransferase (neo) gene) is inserted downstream of the Factor VIII cDNA to 
allow selection for stably transfected mammalian cells using the neomycin analog G418. 
Expression of the neo gene is under the control of the simian virus 40 (SV40) early promoter. 
The pUC 19-based amplicon carrying the pBR322-derived-(3-lactamase (amp) and origin of 
replication (ori) allows for the uptake, selection and propagation of the plasmid in E coli K-12 
strains. This region was derived from the plasmid pBSII SK+. 

Construction of pXF8.38 

The construct for expression of human Factor VIII, pXF8.38 (Fig. 1 1) is an 11.1 kilobase 
circular DNA plasmid which contains the following elements: A cytomegalovirus immediate 
early I gene (CMV) 5' flanking region comprised of a promoter sequence, 5 5 untranslated 
sequence (5'UTS) and first intron sequence for initiation of transcription of the Factor VIII 
cDNA. The CMV region is next fused with a synthetic, optimally configured B domain-deleted 
Factor VIII cDNA sequence. The Factor VIII cDNA sequence is fused, at the 3 5 end, with a 0.3 
kb fragment of the human growth hormone 3' untranslated sequence. A transcription 
termination signal and 3 5 untranslated sequence (3' UTS) of the human growth hormone gene is 
used to ensure processing of the message immediately following the stop codon. A selectable 
marker gene (the bacterial neomycin phosphotransferase (neo) gene) to allow selection for stably 
transfected mammalian cells using the neomycin analog G418 is inserted downstream of the 
Factor VIII cDNA. Expression of the neo gene is under the control of the simian virus 40 
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(SV40) early promoter. The pUC 19-based amplicon carrying the pBR322-derived p-lactamase 
(amp) and origin of replication (ori) allows for the uptake, selection and propagation of the 
plasmid in E coli K-12 strains. This region was derived from the plasmid pBSII SK+. 

PXF8.269 Construct 

The construct for expression of human Factor VIII (Fig. 12), pXF8.269, is a 14.8 kilobase (kb) 
circular DNA plasmid which contains the following elements: A human collagen (I) a 2 
promoter which contains 0.17 kb of 5' untranslated sequence (5 'UTS), Aldolase A gene 5' 
untranslated sequence (5 'UTS) and first intron sequence for initiation of transcription of the 
Factor VIII cDNA. The aldolase intron region is next fused with a synthetic, wild-type B 
domain-deleted Factor VIII cDNA sequence. A transcription termination signal and 3' 
untranslated sequence (3 'UTS) of the human growth hormone gene to ensure processing of the 
message immediately following the stop codon. A selectable marker gene (the bacterial 
neomycin phosphotransferase (neo) gene) to allow selection for stably transfected mammalian 
cells using the neomycin analog G418 is inserted downstream of the Factor VIII cDNA.. The 
expression of the neo gene is under the control of the SV40 promoter The pUC 19-based 
amplicon carrying the pBR322-derived p-lactamase (amp) and origin of replication (ori) allows 
for the uptake, selection and propagation of the plasmid in E coli K-12 strains. This region was 
derived from the plasmid pBSII SK+. 

PXF8.224 Construct 

The construct for expression of human Factor VIII, pXF8.224 (Fig 13), is a 14.8 kilobase (kb) 
circular DNA plasmid which contains the following elements: A human collagen (I) a 2 
promoter which contains 0.17 kb of 5' untranslated sequence (5'UTS)> aldolase A gene 5' 
untranslated sequence (5 'UTS) and first intron sequence for initiation of transcription of the 
Factor VIII cDNA. The aldolase intron region is next fused with a synthetic, optimally 
configured B domain-deleted Factor VIII cDNA sequence. A transcription termination signal and 
3' untranslated sequence (3 'UTS) of the human growth hormone gene is used to ensure 
processing of the message immediately following the stop codon. A selectable marker gene (the 
bacterial neomycin phosphotransferase (neo) gene) to allow selection for stably transfected 
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mammalian cells using the neomycin analog G418 is inserted downstream of the Factor VIII 
cDNA. The expression of the neo gene is under the control of the SV40 promoter The pUC 19- 
based amplicon carrying the pBR322-derived-p-lactamase (amp) and origin of replication (ori) 
allows for the uptake, selection and propagation of the plasmid in E coli K-12 strains. This 
region was derived from the plasmid pBSII SK+. 

Clotting Assay 

A clotting assay based on an activated partial thromboplastin time (aPTT) (Proctor, et aL, 
Am, J. Clin. Path., 36:212-219, (1961)) was performed to analyze the biological activity of the 
BDD hFVIII molecules expressed by constructs in which BDD-FVIII coding region was 
optimized. 

Biological activity as analyzed using the clotting Assay 
The results of the aPTT-based clotting assay are presented in Table 5, below. Specific activity of 
the hFVIII preparations is presented as aPTT units per milligram hFVIII protein as determined 
by ELISA. Both of the human fibroblast-derived BDD hFVIII molecules (5R and LE) have high 
specific activity when measured the aPTT clotting assay. These specific activities have been 
determined to be up to 2- to 3 -fold higher than those determined for CHO cell-derived full-length 
FVIII (as shown in Table 5). An average of multiple determinations of specific activities for 
various partially purified preparations of 5R and LE BDD hFVIII also shows consistently higher 
values for the BDD hFVIII molecules (1 1,622 Units/mg for 5R BDD hFVIII, and 14,561 
Units/mg for LE BDD hFVIII as compared to 7097 Units/mg for full-length CHO cell-derived 
FVIII). An increased rate and/or extent of thrombin activation has been observed for 
various~BDD hFVIII molecules, possibly due to an effect of the B-domain to protect the heavy 
and light chains from thrombin cleavage and activation (Eaton et aL, Biochemistry, 25:8343- 
8347, (1986), Meulien et aL, Protein Engineering, 2:301-306, (1988)). 
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Table 5. Specific Activities of Various hFVIII Proteins 


hFVII 

Concen- 

aPTT 

Specific 

Product 

tration by 

Activity 

Activity 


ELISA 

(aPTT 

(aPTT 


(mg/mL) 

U/mL) 

U/mg) 

5K BUD 

0.U50 

1306 

26,120 

U I7\7TTT 
nr Vlll 




LEBDD 

0.124 

2908 

23,452 

HFVIII 




Full-length 

0.158 

1454 

9202 

(CHO- 




derived) 




FVIII 





Assay for Human Factor VIII in Transfected Cell Culture Supernatants. 

Samples of cell culture, supernatants having cells transfected with wild-type, or 
optimized human BDD-human Factor VIII were assayed for human Factor VIII (hFVIII) content 
by using an enzyme-linked immunosorbent assay (ELISA). This assay is based on the use of 
two non-crossreacting monoclonal antibodies (mAb) in conjunction with samples consisting of 
cell culture media collected from the supernatants of transfected human fibroblast cells. 
Methods of transfection and identification of positively transfected cells are described in the U.S. 
Patent No. 5,641,670, which is incorporated herein by reference 
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Table 6 


PI as mid 

Promoter / 5' 
Untranslated sequence 

Factor VIII cDNA 
Composition 

Mean 

(FVIII mU/10 6 
Cells / 24 hr.) 

Maximum (FVIII 
mU / 10 6 Cells / 24 
hr.) 

Number 
of Strains 

Fold 

increase 

pXF8.36 

CMV IE1 

Wild Type 

567 

2557 

38 

- 

pXF8.38 

CMV IE1 

Optimal Configuration 

5403 

17106 

24 

9.5X 

pXF8.269 

Collagen ID2 /Aldolase 
Intron 

Wild Type 

382 

1227 

18 


pXF8.224 

Collagen ID2/ Aldolase 
Intron 

Optimal Configuration 

2022 

11930 

218 

5.3X 


ELISA units based on standard curves prepared from pooled normal plasma. 


All patents and other references cited herein are hereby incorporated by reference. 
Equivalents 

Those skilled in the art will recognize, or be able to ascertain using no more than routine 
experimentation, many equivalents to the specific embodiments of the invention described 
herein. Such equivalents are intended to be encompassed by the following claims. 

What is claimed is: 


60 


1 LA synthetic nucleic acid sequence which encodes a protein wherein at least one non- 

2 common codon or less-common codon has been replaced by a common codon, and having one or 

3 more of the following properties : 

4 (i) the synthetic nucleic acid sequence comprises a continuous stretch of at least 

5 90 codons all of which are common codons; 

6 (ii) the synthetic nucleic acid sequence comprises a continuous stretch of common 

7 codons, which continuous stretch includes at least 33% or more of the codons in the synthetic 

8 nucleic acid sequence; or 

9 (iii) wherein at least 94% or more of the codons in the sequence encoding the 

10 protein are common codons and wherein the synthetic nucleic acid sequence encodes a protein of 

n at least about 90 amino acids in length. 

1 2. The synthetic nucleic acid sequence of claim 1, wherein said synthetic nucleic acid 

2 sequence encodes a protein wherein at least one non-common codon or less-common codon has 

3 been replaced by a common codon, and wherein the synthetic nucleic acid sequence comprises a 

4 continuous stretch of at least 90 codons all of which are common codons. 

1 3. The synthetic nucleic acid sequence of claim 1, wherein said synthetic nucleic acid 

2 sequence encodes a protein wherein at least one non-common codon or less-common codon has 

3 been replaced by a common codon, and wherein the synthetic nucleic acid sequence comprises a 

4 continuous stretch of common codons, which continuous stretch includes at least 33% or more of 

5 the codons in the synthetic nucleic acid sequence. 

1 4. The synthetic nucleic acid sequence of claim 1, wherein said synthetic nucleic acid 

2 sequence encodes a protein wherein at least one non-common codon or less-common codon has 

3 been replaced by a common codon, and wherein at least 94% or more of the codons in the 

4 sequence encoding the protein are common codons and wherein the synthetic nucleic acid 

5 sequence encodes a protein of at least about 90 amino acids in length. 
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1 5. The nucleic acid sequence of claim 1, wherein the continuous stretch occurs in a 

2 nucleic acid sequence which is selected from a group of sequences consisting of a sequence of a 

3 pre-pro-protein; a sequence of a pro-protein; a sequence of a mature protein; a "pre" sequence of 

4 a pre-pro-protein; a "pre-pro" sequence of a pre-pro-protein; a "pro" sequence of a pre-pro or a 

5 pro-protein; or a portion of any of the aforementioned sequences. 

1 6. The nucleic acid sequence of claim 1, wherein the continuous stretch comprises at 

2 least 95 common codons. 

1 7. The nucleic acid sequence of claim 1, wherein the nucleic acid comprises at least 30 

2 non-common or less-common codons, these codons having been replaced with common codons. 

1 8. The nucleic acid of claim 1, wherein the number of non-common or less-common 

2 codons replaced or remaining is less than 15. 

1 9. The nucleic acid of claim 1 , wherein the non-common and less-common codons, taken 

2 together, replaced or remaining, are equal or less then 6% of the codons in the synthetic nucleic 

3 acid sequence. 

1 10. The nucleic acid of claim 1 , wherein all of the non-common or less-common codons 

2 of the synthetic nucleic acid sequence encoding a protein have been replaced with common 

3 codons. 

1 11. The nucleic acid of claim 1 , wherein all of the non-common and less-common 

2 codons of the synthetic nucleic acid sequence encoding a protein have been replaced with 

3 common codons. 

1 12. The nucleic acid of claim 1 , wherein the nucleic acid sequence encodes a protein of 

2 at least about 105 amino acids in length. 

1 13. The nucleic acid of claim 1, wherein at least 96% of the codons in the synthetic 

2 nucleic acid sequence are common codons. 
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1 14. The nucleic acid of claim 1, wherein at least 98% of the codons in the synthetic 

2 nucleic acid sequence are common codons. 

1 15. A synthetic nucleic acid sequence which encodes Factor VIII, wherein at least one 

2 non-common codon or less-common codon has been replaced by a common codon and wherein 

3 the synthetic nucleic acid has one or more of the following properties: it has a continuous stretch 

4 of at least 90 codons all of which are common codons; it has a continuous stretch of common 

5 codons which comprise at least 33% of the codons of the synthetic nucleic acid sequence; at least 

6 94% or more of the codons in the sequence encoding the protein are common codons and the 

7 synthetic nucleic acid sequence encodes a protein of at least about 90 amino acids in length; it is 

8 at least 80 base pairs in length. 

1 16. The synthetic nucleic acid sequence of claim 15 where the factor VIII protein has 

2 one or more of the following characteristics: 

3 a) the B domain is deleted (BDD factor VIII); 

4 b) it has a recognition site for an intracellular protease of the PACE/furin class; 

5 or 

6 c) it is inserted into a non-transformed cell 

1 17. The synthetic nucleic acid sequence of claim 15, wherein the number of non- 

2 common or less- common codons replaced or remaining is less than 15. 

1 18. The synthetic nucleic acid sequence of claim 15, wherein the number of non- 

2 common or less- common codons replaced or remaining, taken together, are equal or less then 

3 6% of the codons in the synthetic nucleic acid sequence. 

1 19. The synthetic nucleic acid sequence of claim 15, wherein all non- common or less- 

2 common codons are replaced with common codons. 

1 20, The synthetic nucleic acid sequence of claim 15, wherein all non- common and less- 

2 common codons are replaced with common codons. 
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1 21 . The synthetic nucleic acid sequence of claim 15, wherein at least 96% of the codons 

2 in the synthetic nucleic acid sequence are common codons. 

1 22. The synthetic nucleic acid sequence of claim 15, wherein at least 98% of the codons 

2 in the synthetic nucleic acid sequence are common codons. 

1 23. The synthetic nucleic acid sequence of claim 15, wherein all of the codons are 

2 replaced with common codons. 

1 24. A synthetic nucleic acid sequence which encodes Factor IX, wherein at least one 

2 non-common codon or less-common codon has been replaced by a common codon and wherein 

3 the synthetic nucleic acid has one or more of the following properties: it has a continuous stretch 

4 of at least 90 codons all of which are common codons; it has a continuous stretch of common 

5 codons which comprise at least 33% of the codons of the synthetic nucleic acid sequence; at least 

6 94% or more of the codons in the sequence encoding the protein are common codons and the 

7 synthetic nucleic acid sequence encodes a protein of at least about 90 amino acids in length; it is 

8 at least 80 base pairs in length. 

1 25. The synthetic nucleic acid sequence of claim 24, wherein the factor IX protein has 

2 one or more of the following characteristics: 

3 a) it has a PACE/furin site at a pro-peptide mature protein junction; or 

4 b) is inserted into a non-transformed cell. 

1 26. The synthetic nucleic acid sequence of claim 24, wherein the number of non- 

2 common or less- common codons replaced or remaining is less than 15. 

1 27. The synthetic nucleic acid sequence of claim 24, wherein the number of non- 

2 common or less- common codons replaced or remaining, taken together, are equal or less then 

3 6% of the codons in the synthetic nucleic acid sequence. 

1 28. The synthetic nucleic acid sequence of claim 24, wherein all non- common or less- 

2 common codons are replaced with common codons. 
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1 29. The synthetic nucleic acid sequence of claim 24, wherein all non- common and less- 

2 common codons are replaced with common codons. 

1 30. The synthetic nucleic acid sequence of claim 24, wherein at least 96% of the codons 

2 in the synthetic nucleic acid sequence are common codons. 

1 31. The synthetic nucleic acid sequence of claim 24, wherein at least 98% of the codons 

2 in the synthetic nucleic acid sequence are common codons. 

1 32. The synthetic nucleic acid sequence of claim 24, wherein all of the codons are 

2 replaced with common codons. 

l 33. A vector comprising the synthetic nucleic acid sequence of claim 1, 15, or 24. 

l 34. A cell comprising the nucleic acid sequence of claim 1, 15, or 24. 

1 35. A method for preparing a synthetic nucleic acid sequence which is at least 90 codons 

2 in length, comprising: 

3 identifying a non-common codon and a less-common codon in a non-optimized 

4 gene sequence which encodes a protein; and 

5 replacing at least 94% of the non-common and less-common codons with a 

6 common codon encoding the same amino acid as the replaced codon. 

1 36. The method of claim 35, wherein at least 96% of the non-common and less-common 

2 codons are replaced with a common codon encoding the same amino acid as the replaced codon. 

1 37. The method of claim 35, wherein at least 98% of the non-common and less-common 

2 codons are replaced with a common codon encoding the same amino acid as the replaced codon 

1 38. The method of claim 35, wherein the nucleic acid sequence encodes a protein of at 

2 least about 105 or more codons in length. 
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1 39. A method for making a nucleic acid sequence which directs the synthesis of an 

2 optimized message of a protein of at least 90 amino acids comprising: 

3 synthesizing at least two fragments of the nucleic acid sequence, wherein the two 

4 fragments encode adjoining portions of the protein and wherein both subunits are mRNA 

5 optimized; and 

6 joining the two fragments such that a non-common codon is not created at a 

7 junction point, thereby making the mRNA optimized nucleic acid sequence. 

1 40. The method of claim 39, wherein the two fragments are joined together such that a 

2 unique restriction endonuclease site is not created at the junction point. 

1 41 . The method of claim 39, wherein the two fragments are joined together such that a 

2 unique restriction site is created. 

1 42. The method of claim 39, wherein three fragments of the nucleic acid sequence are 

2 synthesized. 

1 43. The method of claim 39, wherein the synthetic nucleic acid sequence encodes a 

2 protein of 105 or more codons in length. 

1 44. The method of claim 39, wherein 96% of the codons in the synthetic nucleic acid 

2 sequence are common codons. 

1 45. The method of claim 39, wherein 98% of the codons in the synthetic nucleic acid 

2 sequence are common codons. 

1 46. The method of claim 39, wherein all of the codons in the synthetic nucleic acid 

2 sequence are common codons. 

1 47. The method of claim 39, wherein the number of codons which are not common 

2 codons is equal to or less than 15. 

i 48. The method of claim 39, wherein each fragment is at least 30 codons in length. 
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1 49. A method of providing a subject with a protein or polypeptide, comprising: 

2 providing a synthetic nucleic acid sequence that can direct the synthesis of an 

3 optimized message for a protein or polypeptide; 

4 introducing the synthetic nucleic acid sequence into the subject; and 

5 allowing the subject to express the protein or polypeptide, thereby providing the 

6 subj ect with the protein. 

l 50. The method of claim 49, wherein the synthetic nucleic acid is introduced into a cell. 

1 51. The method of claim 50, wherein the cell can be an autologous, allogenic, or 

2 xenogeneic cell 

1 52. The method of claim 50 wherein the cell is a fibroblast, a hematopoietic stem cell, a 

2 myoblast, a keratinocyte, an epithelial cell, an endothelial cell, a glial cell, a neural cell, a cell 

3 comprising a formed element of the blood, a muscle cell and precursors of these somatic cells. 

1 53. The method of claim 49, wherein the codon optimized synthetic nucleic acid 

2 sequence can be inserted into the cell ex vivo or in vivo. 

1 54. The method of claim 49, wherein at least 94%, or all of the codons in the synthetic 

2 nucleic acid sequence are common codons. 

1 55. The method of claim 49, wherein at least 96%, or all of the codons in the synthetic 

2 nucleic acid sequence are common codons. 

1 56. The method of claim 49, wherein at least 98%, or all of the codons in the synthetic 

2 nucleic acid sequence are common codons. 

1 57. The method of claim 49, wherein the number of codons which are not common 

2 codons is equal to or less than 15. 


i 
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1 58. A method for preparing a synthetic nucleic acid sequence encoding a protein which 

2 is at least 90 codons in length, comprising identifying non-common codon and less-common 

3 codons in the non-optimized gene encoding the protein and replacing at least 94% or more of the 

4 non-common and less-common codons with a common codon encoding the same amino acid as 

5 the replaced codon. 

1 59. A primary or secondary cell of vertebrate origin having an exogenous synthetic 

2 nucleic acid sequence which encodes a protein or a polypeptide wherein at least one non- 

3 common codon or less-common codon has been replaced by a common codon and wherein the 

4 synthetic nucleic acid has one or more of the following properties: it has a continuous stretch of 

5 at least 90 codons all of which are common codons; it has a continuous stretch of common 

6 codons which comprise at least 33% of the codons of the synthetic nucleic acid sequence; at least 

7 94% or more of the codons in the sequence encoding the protein are common codons and the 

8 synthetic nucleic acid sequence encodes a protein of at least about 90 amino acids in length; it is 

9 at least 80 base pairs in length and which is free of unique restriction endonuclease sites that 
10 would occur in the message optimized sequence; and 

n DNA sequences, sufficient for expression of the exogenous synthetic DNA in the 

12 transfected primary or secondary cell; 

13 the primary or secondary cell capable of expressing the protein or polypeptide product. 

1 60. The primary or secondary cell of claim 59, wherein the exogenous synthetic nucleic 

2 acid is transfected into the cell. 

1 61 . The primary or secondary cell of claim 59, wherein the exogenous synthetic nucleic 

2 acid sequence is stably integrated into its genome. 

1 62. The primary or secondary cell of claim 59, wherein the exogenous synthetic nucleic 

2 acid is present in the cell in an episome. 

63. The primary or secondary cell of claim 59, wherein the DNA sequence sufficient for 
expression of the exogenous synthetic nucleic acid is of non- viral origin. 
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Abstract 

The present invention is directed to a synthetic nucleic acid sequence which encodes a 
protein wherein at least one non-common codon or less-common codon is replaced by a common 
codon. The synthetic nucleic acid sequence can include a continuous stretch of at least 90 
codons all of which are common codons. 
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EcoRi Nhel 

1 TAGAATTCGTAGGCTAGCATGCAGATCGAGCTCAGCACCTGC'TTCTTCCTGTGCCTGCTGCGCTTCTGCTTC 
!► Mec*JlnXIaGluZ*euSerT«rCys?hePheLeuCy3LeuLeuArgPh9CysPhe 
^3 AGCGCCACCCGCCGCTACTACCTGGGCGCCGTGGAGCTGAGCTGGGACTACATGCAGAGCGACCTGGGCGAG 
19 ► serAlaThrArgArgTyrTyrLsuGlyAlaValGIuLeuSerTrpAspTyrMeCGlnSerAspLeuGlyGIu 

145 CTGCCCGTGGACGCCCGCTTCCCCCCCCGCGTGCCCAAGAGCTTCCCCTTCAACACCAGCGTGGTGTACAAG 
43 ► T.euProValAspAlaArgPhePrcPrcArgValPrcLysSerPhePrcPheAsnThrSerValValTyrLys 

217 AAGACCCTGTTCGTGGAGTTCACCGACCACCTGTTCAACATCGCCAAGCCCCGCCCCCCCTGGATGGGCCTG 
67 ► LysThrLeuPheValGluPheThrAspHisLeaPh-eAsnlleAlaLysPrcArgProProTrpMetGlyLeu 

Apai Mscl 

239 CTGGGCCCCACCATCCAGGCCGAGGTGTACGACACCGTGGTGATCACCCTGAAGAACATGGCCAGCCACCCC 
91> LeuGlyProThrlleGlnAlaGluValTyrAspThrValVallleThrLeuLysAsnHetAlaSerKisPro 
361 GTGAGCCTGCACGCCGTGGGCGTGAGCTACTGGAAGGCCAGCGAGGGCGCCGAGTACGACGACCAGACCAGC 
115 ► valSerLeuHisAlaValGlyValSerTyrTrpLysAlaSerGluGlyAlaGluTyrAspAspGlaThrSer 
433 C AGCGCGAGAAGGAGGACGACAAGGTGTTC C CC GGCGGCAGC C ACACCTACGTGTGGCAGGTGCTGAAGGAG 
139 ► GlnArgGluLysGluAspAspLysValPheProGlyGlySerHisThrTyrValTrpGlnValLeuLysGlu 

Mscl Pmil 
EOS AACGGC C CCATGGCCAGCGAC C CC CTGTGC CTGAC CTAC AGCTAC CTGAGC CACGTGGAC CTGGTGAAGGAC 
163 ► AsnGlyProMetAlaSerAspProLeuCysLeuThrTyrSerTyrleuSerHisValAspLeuValLysAsp 

Mscl 

577 CTGAACAGCGGCCTGATCGGCGCCCTGCTGGTGTGCCGCGAGGGCAGCCTGGCCAAGGAGAAGACCCAGACC 
187 ► LeuAsnSerGlyLeuIleGlyAlaLeuLeuValCysArgGluGlySerLeuAlaLysGluLysThrGlnThr 
649 CTGCACAAGTTCATCCTGCTGTTCGCCGTGTTCGACGAGGGCAAGAGCTGGCACAGCGAGACCAAGAACAGC 
211 ► LeuHisLysPhelleLeuLeuPheAlaValPheAspGluGlyLysSerTrpHisSerGluThrLysAsnSer 
721 CTGATGCAGGACCGCGACGCCGCCAGCGCCCGCGCCTGGCCCAAGATGCACACCGTGAACGGCTACGTGAAC 
235* LeuMetGlnAspArgAspAlaAlaSerAlaArgAlaTrpProLysMetHisThrValAsnGlyTyrValAsn 

Pmil 

793 CGCAGCCTGCCCGGCCTGATCGGCTGCCACCGCAAGAGCGTGTACTGGCACGTGATCGGCATGGGCACCACC 
259 ► ArgSerLeuProGlyLeuIlaGlyCysHisArgLysSerValTyrTrpHisVallleGlyMetGlyThrThr 
865 CCCGAGGTGCACAGCATCTTCCTGGAGGGCCACACCTTCCTGGTGCGCAACCACCGCCAGGCCAGCCTGGAG 
233 ► proGluValHisSerllePheLeuGluGlyHisThrPheLeuValArgAsnHisArgGlnAlaSerLeuGlu 
937 ATCAGCCCCATCACCTTCCTGACCGCCCAGACCCTGCTGATGGACCTGGGCCAGTTCCTGCTGTTCTGCCAC 
2Q7> TieSerProIleThrPheLeuThrAlaGlnThrLauLeuMetAspLeuGlyGlnPheLeuLeuPheCysHis 

1009 ATCAGCAGCCACCAGCACGACGGCATGGAGGCCTACGTGAAGGTGGACAGCTGCCCCGAGGAGCCCCAGCTG 
331^ 1 1 eS er S er Hi sG InHi s AspG 1 yMe tG 1 uAl aTy r Va ILy s Val AspS erCys ProGluGluProGlnLeu 

1081 CGCATGAAGAACAACGAGGAGGCCGAGGACTACGACGACGACCTGACCGACAGCGAGATGGACGTGGTGCGC 
355 ► ArgMetLysAsnAsnGluGluAlaGluAspTyrAspAspAspLeuThrAspSerGluMetAspValValArg 

(Bglll/BamHI) 

1153 TTCGACGACGACAACAGCCCCAGCTTCATCCAGATCCGCAGCGTGGCCAAGAAGCACCCCAAGACCTGGGTG 
379 ► pheAspAspAspAsnSerProSerPhelleGlnlleArgSerValAlaLysLysHisProLysThrTrpVal 

1225 CACTACATCGCCGCCGAGGAGGAGGACTGGGACTACGCCCCCCTGGTGCTGGCCCCCGACG AC CGC AGCTAC 
403 ► HisTyrlleAlaAlaGluGluGluAspTrpAspTyrAlaProLeuValLeuAlaPrcAspAspArgSerTyr 

Eagl 

1297 AAGAGC C AGTAC C TGAACAAC GGCCC C C AGCGC ATCGGC C GC AAG7AC AAG AAGGTGC GC 7T C ATGGCC7AC 
427 ► LvsSerGlnTyrLeuAsnAsr.GIyProGinArglleGlyArgLysTyrLysLysValArgPheMecAiaTyr 

Apal 

1369 ACCGACGAGACCTTCAAGACCCGCGAGGCCATCCAGCACGAGAGCGGCATCCTGGGCCCCCTGCTGTACGGC 
~ 5 1 ► T h r As pGluThrPheLys T r.rArgG 1 uA 1 a 1 1 eG 1 nH i s G i u S e r G I y 1 1 e 1 e uG 1 y ? r c L e uL e uTy r G 1 y 
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1441 3 AGGTGGGCGACAC Z CTGCTGATCATCTTC AAGAAC CAGGCCAGCC GCCC CTACAACATCTACCCCCACGGC 
475 ► GlaValGlyAspThrleuLeuXIeliePheLysAsnGI^AlaSerArgProTyrAsnllsTyrPrcHisGly 

1513 ATCACCGACGTGCGCCCCCTGTACAGCCGCCGCCTGCCCAAGGGCGTGAAGCACCTGAAGGACTTCCCCATC 
499 ► IleThrAspValAr gPrcLeuTyrSerArcArgLauPrcLysGlyValLysHisLeuLysAspPheProIle 

Bglll 

1535 CTGC C CGGCGAG AT CTTCAAG7ACAAGTGGAC CGTGAC CGTGGAGGAC GGCCC C AC CAAGAGGGAC C CGCGC 
523 ► LeuProGlyGluIlePheLysTyrLysTrpThrValThrValGluAspGlyProThrLysSerAspProArg 

1557 TGCCTGACCCGCTACTACAGCAGCTTCGTGAACATGGAGCGCGACCTGGCCAGCGGCCTGATCGGCCCCCTG 
547* C'/sLeuThrArgTyrTyrSerSerPheValAsnMetGluArgAspLeuAlaSerGlyLeuIleGlyProLeu 

1729 CTGATCTGCTACAAGGAGAGCGTGGACCAGCGCGGCAACCAGATCATGAGCGACAAGCGCAACGTGATCCTG 
571 ► LeuIleCysTyrLysGluSerValAspGlnArgGlyAsnGlnlleMecSerAspLysArgAsnVallleLeu 

Kpnl 

1801 TTCAGCGTGTTCGACGAGAACCGCAGCTGGTACCTGACCGAGAACATCCAGCGCTTCCTGCCCAACCCCGCC 


595 ► pheSerValPheAspGluAsnArgSerTrpTyrLeuThrGluAsnlleGlnArgPheLeuProAsnProAla 
1873 GGCGTGCAGCTGGAGGACCCCGAGTTCCAGGCCAGCAACATCATGCACAGCATCAACGGCTACGTGTTCGAC 
619 ► GlyValGlnLeuGluAspPrcGiuPheGlnAlaSerAsnllaMetKisSerlleAsnGlyTyrValPheAsp 
1945 AGCCTGCAGCTGAGCGTGTGCCTGCACGAGGTGGCCTACTGGTACATCCTGAGCATCGGCGCCCAGACCGAC 
CJ 543 ► cerLeuGlnLeuSerValCysLeuKisGluValAlaTyrTrpTyrlleLeuSerlleGlyAlaGlnTiirAsp 

2017 TTCCTGAGCGTGTTCTTCAGCGGCTAC ACCTTC AAGC AC AAGATGGTGTACGAGGACACCCTGACCCTGTTC 
K p 667* PheJUeuSerValPhePheSerGlyTyrThrPheLysHisLysMetValTyrGluAspThrLeuThrLeuPhe 

O BamHl 

" : J 2089 CCCTTCAGCGGCGAGACCGTGTTCATGAGCATGGAGAACCCCGGCCTGTGGATCCTGGGCTGCCACAACAGC 

CP 691^ ProPheSerGlyGluThrValPheMecSerMetGluAsnProGlyLeuTrpIleLeuGlyCysHisAsnSer 

^ 2161 GACTTCCGCAACCGCGGCATGACCGCCCTGCTGAAGGTGAGCAGCTGCGACAAGAACACCGGCGACTACTAC 

10 715* AspPheArgAsnArgGlyMetThrAlaLeuLeuLysValSerSerCysAspLysAsnThrGlyAspTyrTyr 

L 2233 GAGGACAGCTACGAGGACATCAGCGCCTACCTGCTGAGCAAGAACAACGCCATCGAGCCCC££C2£SA£GAG 

\rJ 739 ► Gi U AspSerTyrGluAspIleSerAlaTyrLeuLeuSerLysAsnAsnAlaIleGluProArgLeuGluGlu 

^0 BstXI 

^ 2305 ATC AC C C GC ACCAC C CTGC AGAGCGACC AGGAGGAG ATCGAC T ACGAC GACAC CATCAGCGTGGAGATGAAG 

E TieThrArgThrThrLeuGlnSerAspGlnGluGluXleAspTyrAspAspThrlleSerValGluMetLys 
^0 :277 i^GGAGGACTTCGAC ATCTACGACGAGGACG AGAACCAGAGCCCC CGC AGCTTC CAGAAGAAGAC C C C-C CAC 

Mi "37 ► LvsGluAspPheAs p 1 1 eTyrAspGIuAspGluAsnGIr.Ser Pr oArgSerPheGinl v'sLv'sThrArgHi s 


Pmil 

2449 TACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAGCAGCAGCCCCCACGTGCTGCGCAACCGC 
311* TyrPhelleAlaAlaValGluArgLeuTrpAspTyrGlyMetSerSerSerProHisValLauArgAsnArg 

2521 GCCCAGAGCGGCAGCGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGTTCACCGACGGCAGCTTCACCCAG 
835 ► AlaGlnSerGlySerValProGlnPheLysLysValValPheGlnGluPheThrAspGlySerPheThrGln 

Apat 

2593 CCCCTGTACCGCGGCGAGCTGAACGAGCACCTGGGCCTGCTGGGCCCCTACATCCGCGCCGAGGTGGAGGAC 
859 ► proteuTyrArgGlyGluLeuAsnGluKisLeuGlyLeuLeuGlyProTyrlleArgAlaGluValGluAsp 
BstEH 

2565 AACATCATGGTGACCTTCCGCAACCAGGCCAGCCGCCCCTACAGCTTCTACAGCAGCCTGATCAGCTA.CGAG 
383 ► AsnlieMet ValThrPheArgAsnGlnAlaSer Arc ProTyrSerPheTyr SerSerLeu IleS erTyrGlu 

2737 G AGGAC C AGCGC CAGGGCGC CGAGCCC CGC AAGAAC TTCGTGAAGC C CAACGAGACC AAGAC CTAC TTCTGG 
907* GluAspGlnArgGlnGlyAlaGluProArgLysAsnPheValLysPrcAsnGluThrLysThrTyrPheTrp 

1309 AAGGTGCAGCACCACATGGCCCCCACCAAGGACGAGTTCGACTGCAAGGCCTGGGCCTACTTCAGCGACGTG 
93 !► Lvs ValGlnHisHisMecAIaProThrLysAspGluPiieAspCysLysAlaTrpAlaTyrPheSerAspVal 
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2381 3ACC7GGAGAAGGACG7GCACAGCGGCC7GA7CGGGCCCC7GC7GG7G7GCCACACCAACACCC7GAACCCC 
955 ^ AspLeuGi uLysAspValKisSerG lyLeu! ieGiy PrcLeuLeuVa 1C y sHis 'T hr As n;T hr LeuAsnPr c 

Eagi BstEil 

2953 GCCC ACGGCCGCCAGGTGACCGTGCAGGAGTT CGCCC7G77C77CACCA7C77CGACGAGACCAAGAGC7GG 
979 ► AlaHisGlyArgGlnValThrValGInGluPheAlaLeuPhePheThrliePheAspGluThrLysSerTrp 
3025 7ACT7CACCGAGAACA7GGAGCGCAAC7GCCGCGCCCCC7GCAAeA7CCAGA7GGAGGACCCCACCT7CAAG 
1003 ► TyrPheThrGluAsnMetGiuArgAsnCysArgAla?r_cGysAsnIisGlnMetGluAspProThrPheI,ys 
3097 GAGAAC7ACCGC77CCACGCCA7CAACGGC7ACA7CA7GGACACCC7GCCCGGCC7GG7GA7GGCCCAGGAC 
1027*" GluAsnTyrArgPheHisAlalleAsnGiyTyrlleMetAspThrLeuPrcGlyLeuValMetAlaGlnAsp 

Kpnl Pmll 
3169 CAGCGCA7CCGC7GG7ACC7GC7GAGCA7GGGCAGCAACGAGAACA7CCACAGCA7CCAC7TCAGCGGCCAC 
1051 ► GlnArglleArgTrpTyrLeuLeuSerMetGlySerAsnGluAsnllsHisSerlleHisPheSerGlyHis 
3241 G7G77CACCG7GCGCAAGAAGGAGGAG7ACAAGA7GGCCC7G7ACAACC7G7ACCCCGGCG7G77CGAGACC 
1075 ► ValPheThrValArgLysLysGluGluTyrLysMetAlaLeuTyrAsnLeuTyrProGlyValPheGluThr 
3 3 13 GTGGAGATGCTGCCCAGCAAGGCCGGGATCTGGCGGGTGGAGTGCCTGATCGGGGAGCACCTGCACGCCGGC 
1099^ ValGluMetLeuProSerLysAlaGlylleTrpArgyalGluCysLeuIleGlyGluHisLeuHisAlaGly 
3385 A7GAGCACCC7G77CC7GG7G7ACAGCAACAAG7GCCAGACCCCCC7GGGCA7GGCCAGCGGCCACA7CCGC 
1123 ► MecSerThrLeuPheLeuValTyrSerAsnLysCysGlnThrProLeuGlyMetAlaSerGlyHisIleArg 

Apai 

3457 GACT7CCAGA7CACCGCCAGCGGCCAG7ACGGCCAG7GGGCCCCCAAGC7GGCCCGCC7GCACTACAGCGGC 
1147 ► AspPheGlnIleThrAlaSerGlyGlnTyrGlyGlnTrpAlaProLysI*euAlaArgLeuHisTyrSerGly 
3529 AGCA7CAACGCC7GGAGCACCAAGGAGCCC77CAGC7GGA7CAAGG7GGACC7GC7GGCCCCCA7GA7CA7C 
1171 ► SerlleAsnAlaTrpSerThrLysGluProPheSerTrpIleLysValAspLeuLeuAlaProMetllelle 
3601 CACGGCA7CAAGACCCAGGGCGCCCGCCAGAAG77CAGCAGCC7G7ACA7CAGCCAG77CA7CATCA7G7AC 
1195* HisGlylleLysThrGlnGlyAlaArgGlnLysPheSerSerLeuTyrlleSerGlnPhellelleMetTyr 
3673 AGCCTGGACGGCAAGAAG7GGCAGACCTACCGCGGCAACAGCACCGGCACCCTGATGGTGTTCTTCGGCAAC 
1219 ► SerLeuAspGlyLysLysTrpGlnThrTyrArgGlyAsnSerThrGlyThrLeuMetValPhePheGlyAsn 

(Smai/EcoRV) 

3745 G7GGACAGCAGCGGCA7CAAGCACAACA7CTTCAACCCCCCCA7CA7CGCCCGC7ACA7CCGCC7GCACCCC 
1243* VaiAspSer SerGlylleLysHisAsnX lePheAsnProProIlel leAlaArgTyr I leArgLeuHisPr o 
3817 ACCCAC7ACAGCA7CCGCAGCACCC7GCGCA7GGAGC7GA7GGGC7GCGACC7GAACAGC7GCAGCA7GCCC 
1267* ThrHisTyrSerlleArgSerThrLeuArgMetGiuLeuMetGIyCysAspLeuAsnSerCysSerMetPro 
3889 CTGGGCA7GGAGAGCAAGGCCA7CAGCGACGCCCAGA7CACCGCCAGCAGC7AC77CACCAACA7G77CGCC 
1291* LeuGlyMetGluSerLysAialleSerAspAlaGlnlleThrAlaSerSerTyrPheThrAsnMetPheAla 
3961 ACC7GGAGCCCCAGCAAGGCCCGCC7GCACC7GCAGGGCCGCAGCAACGCC7GGCGCCCCCAGG7GAACAAC 
1315* ThrTrpSerProSerLysAlaArgLeuHisLeuGlnGlyArgSerAsnAlaTrpArgProGlnValAsnAsn 

B st Ell 

4033 C CCAAGGAG7GGC7GGAGG7GGACTTCCAGAAGAG GA7GAAGG7GAC GGGCG7G ACC AC G C AGGGCG7GAAG 
1339 ► ProLysGluTrpLeuGlnValAspPheGlnLysThrMetLysValThrGlyValThrThrGlnGlyValLys 
4105 AGCC7GC7GACCAGCA7G7ACG7GAAGGAG77CC7GA7CAGCAGCAGCCAGGACGGCCACCAGTGGACCC7G 
1363* SerLeuLeuThrSerMetTyrValLysGluPheLeuIleSerSerSerGlnAspGlyHisGlnTrpThrLeu 
4177 TTC77CCAGAACGGCAAGG7GAAGG7G77CCAGGGCAACCAGGACAGC77CACCCCCG7GG7GAACAGCC7G 
1387 ► phePheGlnAsnGlyLysValLysValPheGlnGlyAsnGlnAspSerPheThrProValValAsnSerLeu 
4249 GACCCCCCCC7GC7GACCCGC7ACC7GCGCA7CCACCCCCAGAGC7GGG7GCACCAGA7CGCCC7GCGCA7G 
1411 ► AspPrcProLeuLeuThrArgTyrLeuArglleHisProGinSerTrpValHisGlnlleAlaLeuArgMet 

Smal Hindlll 

4321 GAGG7GC7GGGC7GCGAGGCCCAGGACC7G7AC7AGC7GCCCGGGC7ACAAGC777 
1435* GluValLeuGlyCysGluAIaGinAspLeuTyr • • • 
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EcoRI Nhel 

1 TAGAA77CG7AGGC7AGCA7GCAGA7CGAGC7GAGCACC7GC77C77CC7G7GCC7GC7GCGC77C7GC77C 
!► MecGir.IIeGluLeuSerThrCysPhePheLeuCysLeuLauArgPheCysPhs 

73 AGCGGCACCGGCCGCTACTACCTGGGCGGCGTGGAGCTGAGCTGGGACTACATGCAGAGCGACCTGGGCGAG 

19 ► serAiaThrArgArgTyrTyrLeuGlyAlaValGIuLeuSerTrpAspTyrMetGlnSerAspLeuGlyGIu 
145 crGCCGGTGGACGCCCGGTTCCCCCCGCGCGTGGCGAAGAGCTTCCCCTTCAACACCAGCGTGGTGTACAAG 

43 ► LeuPrsValAspAlaArgPhePrc?roArgVai?roLysSerPheProPheAsnThrSerVaiValTyrLys 
217 ^AGACCCTGTTCGTGGAGTTCACCGACCACCTGTTCJ^CATCGCCAAGCCCCGGCGCGCCTGGATGGGCCT^ 

67* LysThrLeuPheValGluPheThrAspHisLeuPheAsrillaAlaLysProArgProProTrpMetGlyLeu 
Apai Msci 
289 crGGGCGCCACCATCCAGGCCGAGGTGTACGAGACCGTGGTGATCACCCTGAAGAACATGGCCAGCCAGCCC 

91> LeuGlyProThrlleGlnAlaGluValTyrAspThrValVallleThrLeuLysAsnMetAlaSerHisPro 
361 G7GAGCC7GCACGCCG7GGGCG7GAGC7AC7GGAAGGCCAGCGAGGGCGCCGAG7ACGACGACCAGACCAGC 
115 ► valSerLeuHisAlaValGlyValSerTyrTrpLysAlaSerGluGlyAlaGluTyrAspAspGlnThrSer 
433 C AGCGGGAGAAGGAGGACGACAAGGTGTTC C C C GGCGGC AGCGAC ACCTACGTGTGGCAGGTGCTGAAGGAG 
13 9 ► GlnArgGluLysGluAspAspLysValPhePrcGlyGlySerHisThrTyrValTrpGlnValLeuLysGlu 

Msci Pmil 
505 AACGGC C C CATGGC C AGG GACC C C CTGTGC C7G AC C7AC AGC7AC C7GAGCCAC G7GGACC7GG7GAAGGAC 
163 ► AsnGlyProMetAlaSerAspProLeuCysLeu-ThrTyrSerTyrLeuSerHisValAspLeuValLysAsp 

Msci 

577 C7GAACAGCGGCC7GA7CGGCGCCC7GC7GG7G7GCCGCGAGGGCAGCC7GGCCAAGGAGAAGACCCAGACC 
187 ^ LeuAsnSerGlyLeuIleGlyAlaLieuLeuValCysArgGluGlySerLeuAlaLysGluLysThrGlnThr 
649 C7GCACAAG7TCA7CC7GC7G77CGCCG7G77CGACGAGGGCAAGAGC7GGCACAGCGAGACCAAGAACAGC 
211 ► LeuHisLysPhelleLeuLeuPheAlaValPheAspGluGlyLysSerTrpHisSerGluThrLysAsnSer 
721 CTGATGCAGGACCGCGACGCCGCCAGCGCCCGCGCCTGGCCCAAGATGCACACCGTGAACGGCTACGTGAAC 
235 ► LeuMetGlnAspArgAspAlaAlaSerAlaArgAlaTrpProLysMetHisThrValAsnGlyTyrValAsn 

Pmll 

793 CGCAGCCTGCCCGGCCTGATCGGCTGCCACCGCAAGAGCGTG7ACTGGCACGTGATCGGCA7GGGCACCACC 
259* ArgSerLeuProGlyLeuIleGlyCysHisArgLysSerValTyrTrpHisVallleGlyMetGlyThrThr 
365 CCCGAGG7GCACAGCA7C77CC7GGAGGGCCACACC7TCC7GG7GCGCAACCACCGCCAGGCCAGCC7GGAG 
283 ► PrcGiuValKisSerllePheLeuGluGlyHisThrPhsLeuVaiArgAsnHisArcGlnAlaSerLeuGiu 
?37 A7CAGCCCCATCACCTTCC7GACCGCCCAGACCC7GCTGA7GGACCTGGGCCAGT7CC7GC7G77C7GCCAC 
207* Z leSerProIleThrPhsLeuThrAlaGln-ThrLeuLeuMetAspLeuGlyGlnPheLeuLeuPheCysHis 

1009 A7C AGCAGC CAC CAGC ACG ACGGC A7GGAGGCC 7ACG7GAAGG7GGAC AGC7GC C C C GAGGAGC C C CAGC7G 
331> HaSerSerHisGlnHisAspGlyMetGluAlaTyrValLysValAspSerCysProGluGluProGlnLeu 

1081 CGCA7GAAGAACAACGAGGAGGCCGAGGAC7ACGACGACGACC7GACCGACAGCGAGA7GGACG7GG7GCGC 
355* ArgMetLysAsnAsnGluGluAlaGluAspTyrAspAspAspLeuThrAspSerGluMetAspValValArg 

(Bgtll/BamHI) 

1153 T7CGACGACGACAACAGCCCCAGC77CA7CCAGA7CCGCAGCG7GGCCAAGAAGCACCCCAAGACC7GGG7G 
379 ► PheAspAspAspAsnSerProSerPhelleGlnlleArgSerValAlaLysLysHis ProLysThrTrpVal 

1225 CACTACATCGCCGCCGAGGAGGAGGACTGGGAC7ACGCCCCCC7GG7GCTGGCCCCCGACGACCGCAGC7AC 
403 ► HisTyrlleAlaAlaGluGluGluAspTrpAspTyrAlaPrcLeuValLeuAlaProAspAspArgSer-ryr 

Eagl 

1297 AAGAGCCAG7ACC7GAACAACGGCCCCCAGCGCA7CGGCCGCAAG7ACAAGAAGG7GCGC77CA7GGCC7AC 
427 ► LysSerGlr.TyrLeuAsnAsnGlyPrcGir.ArgZIeGlyArglysTyrLysLysValArgPheMetAlaryr 

Apat 

12 59 ACCGACGAGACC77CAAGACCCGCGAGGCCA7CCAGCACGAGAGCGGCA7CC7GGGCCCCC7GC7G7A.CGGC 
451 ► ThrAspGiuThrPheLysThrArgGluAlalleGlnHisGluSerGlylleLeuGlyProLeuLeuTyrGly 
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1441 3 A.GGTGGGCGACACC CTGCTGA.TCATCTTCAAGAACCA.GGCC AGC Z GCCC CTACAA.CATCTA.CCCCCA.CGGC . 
475 ► G 1 uValGlyAspThrLauLeu! lal lePheLysAsnGIr.AlaSerAr gPrcTy r As nl 1 eTyr Pr c H i sG I y 

1513 ATCACCGACGTGCGCCCCCTGTACAGCCGGCGCCTGCCCAAGGGCGTGAAGCACCTGAAGGACTTCCCCATC 
499 ► 1 1 =Thr Asp ValAr gPr o LeuTyrSer Ar gAr gLeu Pr cLy sGIyV a ILysHisLsuLys Asp PhePr c I 1 a 
Bqlll 

1585 CTGCCGGGGGAGATCTTCAAGTACAAGTGGACCGT-GACCGTGGAGGACGGCCCCACCAAGAGCGAGCGCCGC 
523 ► LeuProGlyGluIlePheLysTyrLysTrpThrValThrValGluAspGlyProThrLysSerAspProArg 

1657 7GCCTGACCCGCTACTACAGCAGCTTCGTGAACATGGAGCGGGACCTGGCCAGCGGCCTGATCGGCCCCCTG 
547 ► C'/sLeuThrArgTyrTyrSerSerPheValAsnMetGIuArgAspLeuAlaSerGlyLeuIleGlyProLeu 

1729 CTGATCTGCTAC AAGGAGAGCGTGGAC CAGCGGGGCAAC CAGATC ATGAGCGACAAGCGCAACGTGATC CTG 
571* LeuIleCysTyrLysGluSerValAspGlnArgGlyAsnGlnlleMecSerAspLysArgAsnVallleLeu 

Kpnl 

1801 TTCAGCGTGTTCGACGAGAACCGCAGCTGGTACCTGACCGAGAACATCCAGCGCTTCCTGCCCAACCCCGCC 

595 ► PheSerValPheAspGluAsnArgSerTrpTyrlieuThrGluAsnlleGlnArgPheLeuProAsnProAla 
1873 GGCGTGC AGCTGGAGGAG CC CGAGTTCCAGGG G AGC AAG ATCATGCACAGC ATCAACGGCTACGTGTT CGAC 

619 ► G ly ValGlnLeuG 1 uAs p Pr cG 1 uPheGlnAla S er As r. I leMetH is S er II eAsriGIyTyr7a IP he Asp 
1945 AGCCTGCAGCTGAGCGTGTGGCTGCACGAGGTGGGCTAGTGGTACATCCTGAGCATCGGCGCGCAGACCGAC 

643 ► SerLeuGlnLeuSerValCysLeuHisGluValAlaTyrTrpTyrlleLeuSerlleGXyAlaGlnThrAsp 
2 0 17 TTCCTGAGCGTGTTCTTCAGCGGCTACACCTTCAAGCACAAGATGGTGTACGAGGACACCCTGACCCTGTTC 

667 ► PheLeuSerValPhePheSerGlyTyrThrPheLysHisLysMetValTyrGluAspThrLeuThrLeuPhe 

BamHl 

2089 CCCTTCAGCGGCGAGACCGTGTTCATGAGCATGGAGAACCCCGGCCTGTGGATCCTGGGCTGCCACAACAGC 
691 ► ProPheSerGlyGluThrValPheMetSerMetGluAsnProGlyLeuTrpIleLeuGlyCysHisAsnSer 

2161 GACTTCCGCAACCGCGGCATGACCGCCCTGCTGAAGG7GAGCAGCTGCGACAAGAACACCGGCGACTACTAC 
715 ► AspPheArgAsnArgGlyMetThrAlaLeuLeuLysValSerSerCysAspLysAsnThrGlyAspTyrTyr 

2233 c;aggacagctacgaggacatcagcgcc?acctgctgagcaagaacaacgccatcgagccc cgcaggcgcagg 

739 ^ GluAspSerTyrGluAspIleSerAlaTyrLeuLeuSerLysAsnAsnAlalleGluProArgArgArgArg 

BstXI 

2305 CGCG AGATGAC C CGCAC GAC CCTGC AGAGGGAG G AGGAGGAGATGGAGTAGGAGGAGACCATC AGCGTGGAG 
~53 ► ArgGluI 1 eThrAr gThrThrLeuGlnSerAspGlrvGI -Giul 1 =AspTyrAspAspThr I IsSerVaiGlu 

: 3 77 atgaagaaggaggacttcgacatctacgacga.ggacga.gaac c agagcc c ccggaggttc c agaagaagacg 

787 ► MecLysLysGluAspPheAspIlsTyrAspGluAspGluAsnGlr.SerPrcArgSerPheGlnLysLysThr 

Pmll 

2449 CGCCACTACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAGCAGCAGCCCCCACGTGCTGCGC 
811 ► ArgHisTyrPhelleAlaAlaValGluArgLeuTrpAspTyrGlyMetSerSerSerProHisValLeuArg 

2521 AACCGCGCCCAGAGCGGCAGCGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGTTCACCGACGGCAGCTTC 
835 ► AsnArgAlaGlnSerGlySerValProGlnPheLysLysValValPheGlnGluPheThrAspGlySerPhe 

Apal 

2593 ACCCAGCCCCTGTACCGCGGCGAGCTGAACGAGCACCTGGGCCTGCTGGGCCCCTACATCCGCGCCGAGGTG 
359 ► ThrG InProLeuTyrArgGlyGluLeuAsnGluHisLeuGlyLeuLeuGlyProTyrllsArgAlaGluVal 

BstEl! 

2665 GAGGACAACATCATGGTGACCTTCCGCAACCAGGCCA.GCCGCCCCTACAGCTTCTACAGCAGCCTGATCA.GC 
333 ► GI uAspAsnlleMecValThrPheArgAsnGInAIaS erArgPr cGyrSer PheTyrSerSerLsuI leSer 

2737 T ACGAGGAGGAC C AGCGC CAGGGCGCC GAGC C C C GCAAG AACTTC GTGAAGC C C A.ACGAGACC AAGACCTAC 
907 ► TyrGluGluAspGlnArgGlr.GIyAlaGiuPrcArglysAsnPheValLysProAsnGluThrLysThrTyr 

2809 TTCTGGAAGGTGCAGCACCACATGGCCCCCACCAAGGACGAGTTCGACTGCAAGGCCTGGGCCTACTTCAGC 
93 !► PheTrcLvsValGlr.HisHisXatAlaPrcThrlvsAscGiuPheAsDCvsLvsAiaTrcAlaTvrPheSer 


FIG. 9 (2 of 3) 



2881 2 ACG7GGACC7GGAGAAGGACG7GGACAGCGGCC7GA7CGGCCCCCTGCTGG7G7GCCACACCAACACCC7G 
955 ► i3pvalAspLeuGluLysAspValHisSerGIyLeuiIaGlyPrcLeuLeuValCysHisThrAsnThrl.au 
Eagl BstEII 

2953 AACCCCGCCCACGGCCGCCAGG7GACCG7GCAGGAG77CGCCC7G77C7TCACCA7CTTCGACGAGACCAAG 
979 ► AsnProAlaHisGlyArgGlnValThrVaiGlaGluPheAiaLeuPhePheThrliePheAspGluThrLys 
3 02 5 AGCTGG7ACTTCACCGAGAACATGGAGCGGAAC7GCCGCGCCCCC7GCAACA7CCAGA7GGAGGACCCCACC 
1003 ► serTrpTyrPheThrGluAsnMetGluArgAsnCysArgAlaPrcCysAsnlleGlnMetGluAspPrcThr 
3097 TTCAAGGAGAACTACCGCTTCCACGCCATCAACGGCTACATCATGGACACCCTGCCCGGCCTGGTGATGGCC 
1027 ► pheLysGiuAsnTyrArgPheHisAlarieAsnGlyTyrlleMetAspThrLeuProGlyLeuValMetAla 

Kpnl 

3169 CAGGACCAGCGCATCCGCTGGTACCTGCTGAGCATGGGCAGCAACGAGAACATCCACAGCATCCACTTCAGC 
1051 ► GlnAspGlnArglleArgTrpTyrLeuLeuSerMetGlySerAsnGluAsnlleHisSerlieHisPheSer 

Pmll 

3241 GGCCACG7G77CACCG7GCGCAAGAAGGAGGAG7ACAAGA7GGCCC7G7ACAACCTG7ACCCCGGCG7G77C 
1075 ► GlyHisValPheThrValArgLysLysGluGluTyrLysMetAlaLeuTyrAsnLeuTyrProGlyValPhe 
3 3 13 GAGACCGTGGAGATGCTGCCCAGCAAGGCCGGCATCTGGCGCGTGGAGTGCCTGATCGGCGAGCACCTGCAC 
1099 ► oiuThrValGluMetLeuPrcSerLysAlaGlylleTrpArgValGluCysLeuIleGlyGiuHisLeuHis 
3385 GCCGGCATGAGCACCCTGTTCCTGGTGTACAGCAACAAGTGCCAGACCCCCCTGGGCATGGCCAGCGGCCAC 
1123 ► AlaGlyMetSerThrLeuPheLeuValTyrSerAsnLysCysGlriThrProLeuGlyMetAlaSerGlyHis 

Apat 

3457 ATCCGCGACTTCCAGATCACCGCCAGCGGCCAGTACGGCCAGTGGGCCCCCAAGCTGGCCCGCCTGCACTAC 
1147 ► HeArgAspPheGlnlleThrAlaSerGlyGlriTyrGlyGlnTrpAlaProLysLeuAlaArgLeuHisTyr 
3529 AGCGGCAGCATCAACGCCTGGAGCACCAAGGAGCCCTTCAGCTGGATCAAGGTGGACCTGCTGGCCCCCATG 
1171* SerGlySerlleAsnAlaTrpSerThrLysGluProPheSerTrpIleLysValAspLeuLeuAlaProMet 
3601 ATCATCC ACGGC ATC AAGAC CCAGGGCGCCC GCC AGAAGTTCAGC AGCCTGTACATCAGC C AGTTCATC ATC 
1195 ► HelleHisGlylleLysThrGlnGlyAlaArgGlnLysPheSerSerLeuTyrlleSerGlnPhelleXle 
3673 ATGTACAGCCTGGACGGCAAGAAGTGGCAGACCTACCGCGGCAACAGCACCGGCACCCTGATGGTGTTCTTC 
1219 ► MetTyrSerLeuAspGlyLysLysTrpGlnThrTyrArgGlyAsnSerThrGlyThrLeuMetValPhePhe 

(Smal/EcoRV) 

3745 GGCAACGTGGACAGCAGCGGCATCAAGCACAACATCTTCAACCCCCCCATCATCGCCCGCTACATCCGCCTG 
1243 ► GlyAsnValAspSerSerGlylleLysHisAsnliePheAsnPrcProIlelleAlaArgTyrlleArcLeu 
3817 CACCCCACCCACTACAGCATCCGCAGCACCC7GCGCATGGAGC7GA7GGGC7GCGACCTGAACAGC7GCAGC 
1267* HisProThrHisTyrSerlleArgSerThrLeuArgMecGluLeuMetGlyCysAspLeuAsnSerCysSer 
3889 ATGCCCCTGGGCATGGAGAGCAAGGCCATCAGCGACGCCCAGATCACCGCCAGCAGCTACTTCACCAACATG 
1291 ► MecProLeuGlyMetGluSerLysAlalleSerAspAlaGlnlleThrAlaSerSerTyrPheThrAstiMet 
3961 rTCGCCACCTGGAGCCCCAGCAAGGCCCGCCTGCACCTGCAGGGCCGCAGCAACGCCTGGCGCCCCCAGGTG 
1315 ► PheAlaThrTrpSerProSerLysAlaArgLeuHisLeuGlnGlyArgSerAsnAlaTrpArgProGlnVal 

BstEII 

4033 AACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGAAGACC ATGAAGGTGACCGGCGTGACC ACCCAGGGC 
1339 ► AsnAsnProLysGluTrpLeuGlnValAspPheGlnLysThrMetLysValThrGlyValThrThrGlnGly 
4105 GTGAAGAGCCTGCTGACCAGCATGTACG7GAAGGAGTTCCTGATCAGCAGCAGCCAGGACGGCCACCAGTGG 
1363* vaiLysSerLeuLeuThrSerMetTyrValLysGluPheLeuIleSerSerSerGlnAspGlyKisGlnTrp 
4177 ACCCTGTTCTTCCAGAACGGCAAGGTGAAGG7GTTCCAGGGCAACCAGGACAGCTTCACCCGCGTGGTGAAC 
1387* ThrLeuPhePheGlnAsnGlyLysValLys ValPheGlnG lyAsnGlnAspSerPheThr Pro Valval Asn 
4249 AGCC7GGACCCCCCCC7GC7GACCCGC7ACC7GCGCATCCACCCCCAGAGC7GGG7GCACCAGATCGCCC7G 
1411 ► serLeuAspPr oPr oLeuLeurhrArgTyrLeuArgl leHis PrcGlnSerTrp ValHisGIril leAlaLeu 

Smal Hind tl! 

4321 CGCATGGAGG7GCTGGGC7GCGAGGCCGAGGACC7G7AC7AGC7GCCCGGGC7ACAAGC77TAC 
1435 ► ArgMetGluValLeuGiyCysGluAlaGir-AspLeuTyr * • • 
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SEQUENCE LISTING 


<120> OPTIMIZED MESSENGER RNA 
<130> 10278/009001 
<160> 4 

<170> FastSEQ for Windows Version 3.0 

<210> 1 
<211> 4376 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> Artificial Sequence 

<220> 
<221> CDS 

<222> (19).. -(4353) 

<223> Modified HUN2AN Factor VIII 

<400> 1 

tagaattcgt aggctagc atg cag ate gag ctg age acc tgc ttc ttc ctg 

Met Gin lie Glu Leu Ser Thr Cys Phe Phe Leu 
15 10 

tgc ctg ctg cgc ttc tgc ttc age gec acc cgc cgc tac tac ctg ggc 
Cys Leu Leu Arg Phe Cys Phe Ser Ala Thr Arg Arg Tyr Tyr Leu Gly 
15 20 25 

gec gtg gag ctg age tgg gac tac atg cag age gac ctg ggc gag ctg 
Ala Val Glu Leu Ser Trp Asp Tyr Met Gin Ser Asp Leu Gly Glu Leu 
30 35 40 

ccc gtg gac gec cgc ttc ccc ccc cgc gtg ccc aag age ttc ccc ttc 
Pro Val Asp Ala Arg Phe Pro Pro Arg Val Pro Lys Ser Phe Pro Phe 
45 50 55 

aac acc age gtg gtg tac aag aag acc ctg ttc gtg gag ttc acc gac 
Asn Thr Ser Val Val Tyr Lys Lys Thr Leu Phe Val Glu Phe Thr Asp 
60 65 70 75 

cac ctg ttc aac ate gee aag ccc cgc ccc ccc tgg atg ggc ctg ctg 
His Leu Phe Asn lie Ala Lys Pro Arg Pro Pro Trp Met Gly Leu Leu 
80 85 90 

ggc ccc acc ate cag gec gag gtg tac gac acc gtg gtg ate acc ctg 
Gly Pro Thr lie Gin Ala Glu Val Tyr Asp Thr Val Val He Thr Leu 
95 100 105 

aag aac atg gec age cac ccc gtg age ctg cac gee gtg ggc gtg age 
Lys Asn Met Ala Ser His Pro Val Ser Leu His Ala Val Gly Val Ser 
110 115 120 

tac tgg aag gee age gag ggc gee gag tac gac gac cag acc age cag 
Tyr Trp Lys Ala Ser Glu Gly Ala Glu Tyr Asp Asp Gin Thr Ser Gin 
125 130 135 


cac aao aag gag gac gac aag gtg ttc ccc ggc ggc age cac acc tac 
Arg Glu Lys Glu Asp Asp Lys Val Phe Pro Gly Gly Ser His Thr Tyr 


483 


140 

gtg 


579 


tag cag gtg ctg aag gag aac ggc ccc atg gec age gac ccc ctg 531 
Val Trp Gin Val Leu Lys Glu Asn Gly Pro Met Ala Ser Asp Pro Leu 
160 165 170 

tqc ctg acc tac age tac ctg age cac gtg gac ctg gtg aag gac ctg 
Cvs Leu Thr Tyr Ser Tyr Leu Ser His Val Asp Leu Val Lys Asp Leu 
y 175 130 135 

aac age ggc ctg ate ggc gee ctg ctg gtg tgc cgc gag ggc age ctg 627 
Asn Ser Gly Leu He Gly Ala Leu Leu Val Cys Arg Glu Gly Ser Leu 
190 195 200 

acc aag gag aag acc cag acc ctg cac aag ttc ate ctg ctg ttc gee 675 
Ala Lvs Glu Lys Thr Gin Thr Leu His Lys Phe He Leu Leu Phe Ala 
205 210 215 

gtg ttc gac gag ggc aag age tgg cac age gag acc aag aac age ctg 723 
Val Phe Asp Glu Gly Lys Ser Trp His Ser Glu Thr Lys Asn Ser Leu 
220 225 230 235 

atq cag gac cgc gac gec gee age gee cgc gee tgg ccc aag atg cac 771 
Met Gin Asp Arg Asp Ala Ala Ser Ala Arg Ala Trp Pro Lys Met His 
240 245 250 

acc atg aac ggc tac gtg aac cgc age ctg ccc ggc ctg ate ggc tgc 819 
Thr Val Asn Gly Tyr Val Asn Arg Ser Leu Pro Gly Leu He Gly Cys 
255 260 265 

cac cgc aag age gtg tac tgg cac gtg ate ggc atg ggc ace acc ccc 867 
His Arq Lys Ser Val Tyr Trp His Val He Gly Met Gly Thr Thr Pro 
* 270 275 280 

aag atg cac age ate ttc ctg gag gge cac acc ttc ctg gtg cgc aac 915 
Glu Val His Ser He Phe Leu Glu Gly His Thr Phe Leu Val Arg Asn 
285 290 295 

cac cgc cag gec age ctg gag ate age ccc ate acc ttc ctg acc gee 963 
His Arq Gin Ala Ser Leu Glu He Ser Pro He Thr Phe Leu Thr Ala 
300 305 310 315 

cag acc ctg ctg atg gac ctg ggc cag ttc ctg ctg ttc tgc cac ate 1011 
Gin Thr Leu Leu Met Asp Leu Gly Gin Phe Leu Leu Phe Cys His He 
320 325 330 

age age cac cag cac gac ggc atg gag gec tac gtg aag gtg gac age 1059 
Ser Ser His Gin His Asp Gly Met Glu Ala Tyr Val Lys Val Asp Ser 
335 340 345 

tac ccc gag gag ccc cag ctg cgc atg aag aac aac gag gag gec gag 1107 
cva Pro Glu Glu Pro Gin Leu Arg Met Lys Asn Asn Glu Glu Ala Glu 
y 350 355 360 

aao tac qac gac gac ctg acc gac age gag atg gac gtg gtg cgc ttc 1155 
Asp Tyr Asp Asp Asp Leu Thr Asp Ser Glu Met Asp Val Val Arg Phe 
365 370 375 

aac qac gac aac age ccc age ttc ate cag ate cgc age gtg gec aag 1203 
Asp Asp Asp Asn Ser Pro Ser Phe He Gin He Arg Ser Val Ala Lys 
380 385 390 


aaa cac ccc aag acc tgg gtg cac tac ate gec gec gag gag gag gac 
Lvs Hia Pro Lys Thr Trp Val His Tyr He Ala Ala Glu Glu Glu Asp 
1 400 405 410 

tqq gac tac gec ccc ctg gtg ctg gec ccc gac gac cgc age tac aag 

Tro Aao Tvr Ala Pro Leu Val Leu Ala Pro Asp Asp Arg Ser Tyr Lys 
arp asp y ^ 42Q 42g 

acc cac tac ctg aac aac ggc ccc cag cgc ate ggc cgc aag tac aag 
Ser Gin Tyr Leu Asn Asn Gly Pro Gin Arg He Gly Arg Lys Tyr Lys 
430 435 .- • 440 

aaa gtg cgc ttc atg gec tac acc gac gag acc ttc aag acc cgc gag 
Lvs Val Arg Phe Met Ala Tyr Thr Asp Glu Thr Phe Lys Thr Arg Glu 
1 445 450 455 

ccc ate cag cac gag age ggc ate ctg ggc ccc ctg ctg tac ggc gag 
Ala He Gin His Glu Ser Gly He Leu Gly Pro Leu Leu Tyr Gly Glu 
460 465 470 475 

t qgc g ac acc ctg ctg ate ate ttc aag aac cag gee age cgc ccc 
Val Gly Asp Thr Leu Leu He He Phe Lys Asn Gin Ala Ser Arg Pro 
480 485 490 

tac aac ate tac ccc cac ggc ate acc gac gtg cgc ccc ctg tac age 
Tvr Asn He Tyr Pro His Gly He Thr Asp Val Arg Pro Leu Tyr Ser 
1 495 500 505 

cac cgc ctg ccc aag ggc gtg aag cae ctg aag gac ttc ccc ate ctg 
Arg Arg Leu Pro Lys Gly Val Lys His Leu Lys Asp Phe Pro He Leu 
510 515 5 ^0 

ccc aac aag ate ttc aag tac aag tgg acc gtg acc gtg gag gac ggc 
Pro Gly Glu He Phe Lys Tyr Lys Trp Thr Val Thr Val Glu Asp Gly 
525 530 535 

ccc acc aag age gac ccc cgc tgc ctg acc cgc tac tac age age ttc 
Pro Thr Lys Ser Asp Pro Arg Cys Leu Thr Arg Tyr Tyr Ser Ser Phe 
540 545 550 555 

qta aac atg gag cgc gac ctg gec age ggc ctg ate ggc ccc ctg ctg 
Val Asn Met Glu Arg Asp Leu Ala Ser Gly Leu He Gly Pro Leu Leu 
560 565 570 

ate tgc tac aag gag age gtg gac cag cgc ggc aac cag ate atg age 
He Cys Tyr Lys Glu Ser Val Asp Gin Arg Gly Asn Gin lie Met Ser 


575 


gac aag cgc aac gtg ate ctg ttc age gtg ttc gac gag aac cgc age 
Aso Lvs Arg Asn Val He Leu Phe Ser Val Phe Asp Glu Asn Arg Ser 
V * 590 595 600 

tao tac ctg acc gag aac ate cag cgc ttc ctg ccc aac ccc gee ggc 
Trp Tvr Leu Thr Glu Asn He Gin Arg Phe Leu Pro Asn Pro Ala Gly 
605 610 615 

gtg cag ctg gag gac ccc gag ttc cag gee age aac ate atg cac age 
Val Gin Leu Glu Asp Pro Glu Phe Gin Ala Ser Asn lie Met His Ser 
620 625 630 635 

ate aac ggc tac gtg ttc gac age ctg cag ctg age gtg tgc ctg cac 
He Asn Gly Tyr Val Phe Asp Ser Leu Gin Leu Ser Val Cys Leu His 
640 645 650 


gag gtg gcc tac tgg tac ate ctg age ate ggc gec cag acc gac ttc 2019 
Glu Val Ala Tyr Trp Tyr He Leu Ser lie Gly Ala Gin Thr Asp Phe 
655 660 665 

ctg age gtg ttc ttc age ggc tac acc ttc aag cac aag atg gtg tac : 2067 
Leu Ser Val Phe Phe Ser Gly Tyr Thr Phe Lys His Lys Met Val Tyr 
670 675 680 

gag gac acc ctg acc ctg ttc ccc ttc age ggc gag acc gtg ttc atg 2115 
Glu Asp Thr Leu Thr Leu Phe Pro Phe Ser Gly Glu Thr Val Phe Met 
685 690 - 695 

age atg gag aac ccc ggc ctg tgg ate ctg ggc tgc cac aac age gac 2163 
Ser Met Glu Asn Pro Gly Leu Trp lie Leu Gly Cys His Asn Ser Asp 
700 705 710 715 

ttc cgc aac cgc ggc atg acc gcc ctg ctg aag gtg age age tgc gac 2211 
Phe Arg Asn Arg Gly Met Thr Ala Leu Leu Lys Val Ser Ser Cys Asp 
720 725 730 

aag aac acc ggc gac tac tac gag gac age tac gag gac ate age gcc 2259 
Lys Asn Thr Gly Asp Tyr Tyr Glu Asp Ser Tyr Glu Asp He Ser Ala 
735 740 745 

tac ctg ctg age aag aac aac gcc ate gag ccc cgc ctg gag gag ate 2307 
Tyr Leu Leu Ser Lys Asn Asn Ala He Glu Pro Arg Leu Glu Glu He 
750 755 760 

acc cgc acc acc ctg cag age gac cag gag gag ate gac tac gac gac 2355 
Thr Arg Thr Thr Leu Gin Ser Asp Gin Glu Glu He Asp Tyr Asp Asp 
765 770 775 

acc ate age gtg gag atg aag aag gag gac ttc gac ate tac gac gag 2403 
Thr He Ser Val Glu Met Lys Lys Glu Asp Phe Asp He Tyr Asp Glu 
780 785 790 795 

gac gag aac cag age ccc cgc age ttc cag aag aag acc cgc cac tac 2451 
Asp Glu Asn Gin Ser Pro Arg Ser Phe Gin Lys Lys Thr Arg His Tyr 
800 805 810 

ttc ate gcc gcc gtg gag cgc ctg tgg gac tac ggc atg age age age 2499 
Phe He Ala Ala Val Glu Arg Leu Trp Asp Tyr Gly Met ser Ser Ser 
815 820 825 

ccc cac gtg ctg cgc aac cgc gcc cag age ggc age gtg ccc cag ttc 2547 
Pro His Val Leu Arg Asn Arg Ala Gin Ser Gly Ser Val Pro Gin Phe 
830 835 840 

aag aag gtg gtg ttc cag gag ttc acc gac ggc age ttc acc cag ccc 2595 
Lys Lys Val Val Phe Gin Glu Phe Thr Asp Gly Ser Phe Thr Gin Pro 
845 850 855 

ctg tac cgc ggc gag ctg aac gag cac ctg ggc ctg ctg ggc ccc tac 2643 
Leu Tyr Arg Gly Glu Leu Asn Glu His Leu Gly Leu Leu Gly Pro Tyr 
860 865 870 875 

ate cgc gcc gag gtg gag gac aac ate atg gtg acc ttc cgc aac cag 2691 
He Arg Ala Glu Val Glu Asp Asn He Met Val Thr Phe Arg Asn Gin 
880 885 890 

gcc age cgc ccc tac age ttc tac age age ctg ate age tac gag gag 2739 
Ala Ser Arg Pro Tyr Ser Phe Tyr Ser Ser Leu He Ser Tyr Glu Glu 
895 900 905 


gac cag cgc cag ggc gcc gag ccc cgc aag aac ttc gtg aag ccc aac 
Asp Gin Arg Gin Gly Ala Glu Pro Arg Lys Asn Phe Val Lys Pro Asn 
910 915 920 


2787 


gag acc aag acc tac ttc tgg aag gtg cag cac cac atg gcc ccc acc 
Glu Thr Lys Thr Tyr Phe Trp Lys Val Gin His His Met Ala Pro Thr 
925 930 935 


2835 


aag gac gag ttc gac, tgc aag gcc tgg gcc tac ttc age gac gtg gac 
Lys Asp Glu Phe Asp Cys Lys Ala Trp Ala Tyr Phe Ser Asp Val Asp 

940 945 :■- 950 955 


2883 


ctg gag aag gac gtg cac age ggc ctg ate ggc ccc ctg ctg gtg tgc 
Leu Glu Lys Asp Val His Ser Gly Leu lie Gly Pro Leu Leu Val Cys 
960 965 970 


2931 


cac acc aac acc ctg aac ccc gcc cac ggc cgc cag gtg acc gtg cag 
His Thr Asn Thr Leu Asn Pro Ala His Gly Arg Gin val Thr Val Gin 
975 980 985 


2979 


gag ttc gcc ctg ttc ttc acc ate ttc gac gag acc aag age tgg tac 
Glu Phe Ala Leu Phe Phe Thr lie Phe Asp Glu Thr Lys Ser Trp Tyr 
990 995 1000 


3027 


ttc acc gag aac atg gag cgc aac tgc cgc gcc ccc tgc aac ate cag 
Phe Thr Glu Asn Met Glu Arg Asn Cys Arg Ala Pro Cys Asn lie Gin 
1005 1010 1015 

atg gag gac ccc acc ttc aag gag aac tac cgc ttc cac gcc ate aac 
Met Glu Asp Pro Thr Phe Lys Glu Asn Tyr Arg Phe His Ala lie Asn 
1020 1025 1030 1035 


3075 


3123 


ggc tac ate atg gac acc ctg ccc ggc ctg gtg atg gcc cag gac cag 
Gly Tyr lie Met Asp Thr Leu Pro Gly Leu Val Met Ala Gin Asp Gin 
1040 1045 1050 


3171 


cgc ate cgc tgg tac ctg ctg age atg ggc age aac gag aac ate cac 
Arg lie Arg Trp Tyr Leu Leu Ser Met Gly Ser Asn Glu Asn lie His 
1055 1060 1065 


3219 


age ate cac ttc age ggc cac gtg ttc acc gtg cgc aag aag gag gag 
Ser lie His Phe Ser Gly His Val Phe Thr Val Arg Lys Lys Glu Glu 
1070 1075 1080 


3267 


tac aag atg gcc ctg tac aac ctg tac ccc ggc gtg ttc gag acc gtg 
Tyr Lys Met Ala Leu Tyr Asn Leu Tyr Pro Gly Val Phe Glu Thr Val 
1085 1090 1095 


3315 


gag atg ctg ccc age aag gcc ggc ate tgg cgc gtg gag tgc ctg ate 
Glu Met Leu Pro Ser Lys Ala Gly lie Trp Arg Val Glu Cys Leu lie 
1100 1105 1110 1115 


3363 


ggc gag cac ctg cac gcc ggc atg age acc ctg ttc ctg gtg tac age 
Gly Glu His Leu His Ala Gly Met Ser Thr Leu Phe Leu Val Tyr Ser 
1120 1125 1130 


3411 


aac aag tgc cag acc ccc ctg ggc atg gcc age ggc cac ate cgc gac 
Asn Lys Cys Gin Thr Pro Leu Gly Met Ala Ser Gly His lie Arg Asp 
1135 1140 1145 


3459 


ttc cag ate acc gcc age ggc cag tac ggc cag tgg gcc ccc aag ctg 
Phe Gin He Thr Ala Ser Gly Gin Tyr Gly Gin Trp Ala Pro Lys Leu 
1150 1155 1160 


3507 


gcc cgc ctg cac tac age ggc age ate aac gee tgg age ace aag gag 3555 

Ala Arg Leu His Tyr Ser Gly Ser He Asn Ala Trp Ser Thr Lys Glu 

1165 1170 1175 

ccc ttc age tgg ate aag gtg gac ctg ctg gcc ccc atg ate ate cac 3603 

Pro Phe Ser Trp He Lys Val Asp Leu Leu Ala Pro Met He He His 
1180 H85 1190 1195 

ggc ate aag acc cag ggc gcc cgc cag aag ttc age age ctg tac ate 3651 

Gly He Lys Thr Gin Gly Ala Arg Gin Lys Phe Ser Ser Leu Tyr He 
1200 1205 1210 

age cag ttc ate ate atg tac age ctg gac ggc aag aag tgg cag ace 3699 

Ser Gin Phe He He Met Tyr Ser Leu Asp Gly Lys Lys Trp Gin Thr 
1215 1220 1225 

tac cgc ggc aac age acc ggc acc ctg atg gtg ttc ttc ggc aac gtg 3747 

Tyr Arg Gly Asn Ser Thr Gly Thr Leu Met Val Phe Phe Gly Asn Val 
1230 1235 1240 

gac age age ggc ate aag cac aac ate ttc aac ccc ccc ate ate gcc 3795 

Asp Ser Ser Gly He Lys His Asn He Phe Asn Pro Pro He He Ala 

1245 1250 1255 

cgc tac ate cgc ctg cac ccc acc cac tac age ate cgc age acc ctg 3843 

Arg Tyr He Arg Leu His Pro Thr His Tyr Ser He Arg Ser Thr Leu 
1260 1265 1270 1275 

cgc atg gag ctg atg ggc tgc gac ctg aac age tgc age atg ccc ctg 3891 

Arg Met Glu Leu Met Gly Cys Asp Leu Asn Ser Cys Ser Met Pro Leu 
1280 1285 1290 

ggc atg gag age aag gcc ate age gac gcc cag ate acc gcc age age 3939 

Gly Met Glu Ser Lys Ala He Ser Asp Ala Gin He Thr Ala Ser Ser 
1295 1300 1305 

tac ttc acc aac atg ttc gcc acc tgg age ccc age aag gcc cgc ctg 3987 

Tyr Phe Thr Asn Met Phe Ala Thr Trp Ser Pro Ser Lys Ala Arg Leu 
1310 1315 1320 

cac ctg cag ggc cgc age aac gcc tgg cgc ccc cag gtg aac aac ccc 4035 

His Leu Gin Gly Arg Ser Asn Ala Trp Arg Pro Gin Val Asn Asn Pro 

1325 1330 1335 

aag gag tgg ctg cag gtg gac ttc cag aag acc atg aag gtg acc ggc 4083 

Lys Glu Trp Leu Gin Val Asp Phe Gin Lys Thr Met Lys Val Thr Gly 
1340 1345 1350 1355 

gtg acc acc cag ggc gtg aag age ctg ctg acc age atg tac gtg aag 4131 

Val Thr Thr Gin Gly Val Lys Ser Leu Leu Thr Ser Met Tyr Val Lys 
1360 1365 1370 

gag ttc ctg ate age age age cag gac ggc cac cag tgg acc ctg ttc 4179 

Glu Phe Leu He Ser Ser Ser Gin Asp Gly His Gin Trp Thr Leu Phe 
1375 1380 1385 

ttc cag aac ggc aag gtg aag gtg ttc cag ggc aac cag gac age ttc 4227 

Phe Gin Asn Gly Lys Val Lys Val Phe Gin Gly Asn Gin Asp Ser Phe 
1390 1395 1400 

acc ccc gtg gtg aae age ctg gac ccc ccc ctg ctg acc cgc tac ctg 4275 

Thr Pro Val Val Asn Ser Leu Asp Pro Pro Leu Leu Thr Arg Tyr Leu 

1405 1410 1415 


cgc ate cac ccc cag age tgg gtg cac cag ate gee ctg cgc atg gag 4323 
Arg lie His Pro Gin Ser Trp Val His Gin He Ala Leu Arg Met Glu 
1420 1425 1430 1435 


gtg ctg ggc tgc gag gee cag gac ctg tac tagctgcccg ggctacaagc - 4373 
Val Leu Gly Cys Glu Ala Gin Asp Leu Tyr 
1440 1445 

ttt , 4376 


<210> 2 
<211> 1445 
<212> PHT 

<213> Artificial Sequence 
<220> 

<221> Artificial Sequence 

<223> Modified HUN2AN Factor VIII 


<400> 2 


riec bin 

Tin 

i ±.e 


Leu 

Car 
del 

Thr* 

Cys 




e 



y a * i*« 

Ser 

Ala 

Thr 

Arg 

Arg 

Tvr 
A x * 


20 





Trp Asp 

Tyr 

Met 

Gin 

Ser 

Asp 

Leu 






40 

Phe Pro 

Pro 

Arcr 

Val 

Pro 

Lvs 

Ser 

50 





55 


Tyr Lys 

Lys 

Thr 

Leu 

Phe 

Val 

Glu 

65 




70 



Ala Lvs 

Pro 

Ara 

Pro 

Pro 

Trp 

Met 



85 




Ala Glu 

Val 

Tvr 

Asp 

Thr 

Val 

Val 



100 




His Pro 

Val 
115 

Ser 

Leu 

His 

Ala 

Val 
120 

Glu Gly 

Ala 

Glu 

Tyr 

Asp 

Asp 

Gin 

130 





135 


Asp Lys 

Val 

Phe 

Pro 

Gly 

Gly 

Ser 

145 




150 



Lys Glu 

Asn 

Gly 

Pro 

Met 

Ala 

Ser 



165 




Tyr Leu 

Ser 

His 

Val 

Asp 

Leu 

Val 


180 





Gly Ala 

Leu 

Leu 

Val 

Cys 

Arg 

Glu 

195 





200 

Gin Thr 

Leu 

His 

Lys 

Phe 

He 

Leu 

210 





215 


Lys Ser 

Trp 

His 

Ser 

Glu 

Thr 

Lys 

225 




230 



Ala Ala 

Ser 

Ala 

Arg 
245 

Ala 

Trp 

Pro 

Val Asn 

Arg 

Ser 

Leu 

Pro 

Gly 

Leu 


260 





Tyr Trp 

His 

Val 

He 

Gly 

Met 

Gly 

275 





280 

Phe Leu 

GlU 

Gly 

His 

Thr 

Phe 

Leu 

290 





295 


Leu Glu 

He 

Ser 

Pro 

He 

Thr 

Phe 

305 




310 



Asp Leu 

Gly 

Gin 

Phe 

Leu 

Leu 

Phe 


325 




Asp Gly 

Met 

Glu 

Ala 

Tyr 

Val 

Lys 


340 



XT IlC 

T Ait f*^fa 

Leu Leu Arg Phe 


1 O 
Aw 


13 

Tvr 

j, jf i. 

Leu 

Gly Ala 

Val Glu Leu Set* 

25 



30 

Glv 

Glu 

Leu Pjto 

Val Asn Ala Arcr 





Phe 

Pro 

Phe Asn 

Thr Ser Val Val 



60 


Phe 

Thr 

A so His 

T.pu Phe Asn Tie 



75 


Gly 

Leu 

Leu Gly 

Pro Thr He Gin 


90 


95 

He 

Thr 

Leu Lys 

Asn Met Ala Sor 

105 


110 

Gly 

Val 

Ser Tyr 

Trp Lys Ala Ser 




125 

Thr 

Ser 

Gin Arg 

Glu Lys Glu Asp 



140 

His 

Thr 

Tyr Val 

Trp Gin Val Leu 



155 

160 

Asp 

Pro 

Leu Cys 

Leu Thr Tyr Ser 


170 


175 

Lys 

Asp 

Leu Asn 

Ser Gly Leu He 

185 



190 

Gly 

Ser 

Leu Ala 

Lys Glu Lys Thr 




205 

Leu 

Phe 

Ala Val 

Phe Asp Glu Gly 



220 

Asn 

Ser 

Leu Met 

Gin Asp Arg Asp 



235 

240 

Lys 

Met 

His Thr 

Val Asn Gly Tyr 


250 


255 

He 

Gly 

Cys His 

Arg Lys Ser Val 

265 



270 

Thr 

Thr 

Pro Glu 

Val His Ser He 




285 

Val 

Arg 

Asn His 

Arg Gin Ala Ser 



300 


Leu 

Thr 

Ala Gin 

Thr Leu Leu Met 



315 

320 

Cys 

His 

He Ser 

Ser His Gin His 


330 


335 

Val 

Asp 

Ser Cys 

Pro Glu Glu Pro 

345 


350 


Gin 

Leu 

Arg 

Met 



355 


Leu 

Thr 

Asp 

Ser 


370 



Pro 

Ser 

Phe 

He 

385 




Trp 

Val 

His 

Tyr 

Leu 

Val 

Leu 

Ala 




420 

Asn 

Glv 

Pro 

Gin 


435 


Ala 

Tyr 

Thr 

Asp 


450 



Ser 

Gly 

He 

Leu 

465 




Leu 

He 

He 

Phe 

His 

Gly 

He 

Thr 



500 

Glv 

Val 

Lvs 

His 


515 


Lys 

Tyr 

Lys 

Trp 


530 



Pro 

Arg 

Cys 

Leu 

545 




Asp 

Leu 

Ala 

Ser 

Ser 

Val 

Asp 

Gin 




580 

He 

Leu 

Phe 

Ser 



595 


Asn 

He 

Gin 

Arg 


610 



Pro 

Glu 

Phe 

Gin 

625 




Phe 

Asp 

Ser 

Leu 

Tyr 

He 

Leu 

Ser 



660 

Ser 

Gly 

Tyr 
* 

Thr 



675 


Leu 

Phe 

Pro 

Phe 


690 



Gly 

Leu 

Trp 

He 

705 




Met 

Thr 

Ala 

Leu 

Tyr 

Tyr 

Glu 

Asp 




740 

Asn 

Asn 

Ala 

He 



755 


Gin 

Ser 

Asp 

Gin 


770 



Met 

Lys 

Lys 

Glu 

785 




Pro 

Arg 

Ser 

Phe 

Glu 

Arg 

Leu 

Trp 



820 

Asn 

Arg 

Ala 

Gin 


835 


Gin 

Glu 

Phe 

Thr 


850 


Lys 

Asn 

Asn 

Glu 



360 

Glu 

Met 

Asp 

Val 



375 


Gin 

He 

Arg 

Ser 


390 



He 

Ala 

Ala 

Glu 

405 




Pro 

Asp 

Asp 

Arg 

Arg 

He 

Gly 

Arg 




440 

Glu 

Thr 

Phe 

Lys 



455 

Gly 

Pro 

Leu 

Leu 

470 



Lys 

Asn 

Gin 

Ala 

485 




Asp 

Val 

Arg 

Pro 

Leu 

Lys 

Asp 

Phe 




520 

Thr 

Val 

Thr 

Val 



535 


Thr 

Arg 

Tyr 

Tyr 


550 



Gly 

Leu 

He 

Gly 

565 




Arg 

Gly 

Asn 

Gin 

Val 

Phe 

Asp 

Glu 




600 

Phe 

Leu 

Pro 

Asn 



615 


Ala 

Ser 

Asn 

lie 


630 



Gin 

Leu 

Ser 

Val 

645 




He 

Gly 

Ala 

Gin 

Phe 

Lys 

His 

Lys 




680 

Ser 

Gly 

Glu 

Thr 



695 


Leu 

Gly 

Cys 

His 


710 



Leu 

Lys 

Val 

Ser 

725 




Ser 

Tyr 

GlU 

71 **** 

ASp 


Pro 

Arg 

Leu 





Glu 

Glu 

He 

Asp 



775 


Asp 

Phe 

Asp 

He 


790 



Pin 


a 


805 




Asp 

Tyr 

Gly 

Met 

Ser 

Gly 

Ser 

Val 




840 

Asp 

Gly 

Ser 

Phe 


855 


Glu 

Ala 

Glu 

Asp 

Val 

Arg 

Phe 

Asp 




380 

Val 

Ala 

Lys 

Lys 



395 


Glu 

Glu 

Asp 

Trp 


410 



Ser 

Tyr 

Lys 

Ser 

425 




Lys 

Tyr 

Lys 

Lys 

Thr 

Arg 

Glu 

Ala 



460 

Tvr 

Glv 

Glu 

Val 



475 


Ser 

Arg 

Pro 

Tyr 


490 



Leu 

Tyr 

Ser 

Arg 

505 




Pro 

He 

Leu 

Pro 

Glu 

Asp 

Gly 

Pro 




540 

Ser 

Ser 

Phe 

Val 



555 


Pro 

Leu 

Leu 

He 


570 



He 

Met 

Ser 

Asp 

585 



Asn 

Arg 

Ser 

Trp 

Pro 

Ala 

Gly 

Val 



620 

Met 

His 

Ser 

He 



635 


Cys 

Leu 

His 

Glu 

650 



Thr 

Asp 

Phe 

Leu 

665 



Met 

Val 

Tyr 

Glu 

Val 

Phe 

Met 

Ser 




700 

Asn 

Ser 

Asp 

Phe 



715 


Ser 

Cys 

Asp 

Lys 


■7 *i r\ 

730 



He 

Ser 

Ala 

Tyr 

745 




Glu 

Glu 

He 

Thr 

Tyr 

Asp 

Asp 

Thr 




780 

Tyr 

Asp 

Glu 

Asp 



795 


Arg 

His 

Tyr 

Phe 


810 



Ser 

Ser 

Ser 

Pro 

825 




Pro 

Gin 

Phe 

Lys 

Thr 

Gin 

Pro 

Leu 




860 


Tyr 

Asp 

Asp Asp 

365 



Asp 

Asp 

Asn Ser 

His 

Pro 

Lys Thr 



400 

Aso 

Tvr 

X 

Ala Pro 



415 

Gin 

Tyr 

Leu Asn 


430 


Val 

Arg 

Phe Met 

445 


He 

Gin 

His Glu 

Gly 

Asp 

Thr Leu 



480 

Asn 

He 

Tyr Pro 



495 

Arg 

Leu 

Pro Lys 


510 


Gly 

Glu 

He Phe 

525 



Thr 

Lys 

Ser Asp 

Asn 

Met 

; 

Glu Arg 



560 

Cvs 

Tvr 

Lys Glu 



575 

Lys 

Arg 

Asn Val 


590 


Tyr 

Leu 

Thr Glu 

605 



Gin 

Leu 

Glu Asp 

Asn 

Gly 

Tyr Val 



640 

Val 

Ala 

Tyr Trp 



655 

Ser 

Val 

Phe Phe 


670 


Asp 

Thr 

Leu Thr 

685 



Met 

Glu 

Asn Pro 

Arg 

Asn 

Arg Gly 



720 

Asn 

Thr 

Glv Asd 



735 

Leu 

Leu 

Ser Lys 


750 


Arg 

Thr 

Thr Leu 

765 



He 

Ser 

Val Glu 

Glu 

Asn 

Gin Ser 



800 

lie 

Axa 

Axa vax 



815 

His 

Val 

Leu Arg 


830 


Lys 

Val 

Val Phe 

845 



Tyr 

Arg 

Gly Glu 


Ser Ser Gin Asp Gly His Gin Trp Thr Leu Phe Phe Gin Asn Gly Lys 

1380 1385 1390 

Val Lys Val Phe Gin Gly Asn Gin Asp Ser Phe Thr Pro Val Val Asn 

139S 1400 1405 

Ser Leu Asp Pro Pro Leu Leu Thr Arg Tyr Leu Arg lie His Pro Gin 

1410 1415 1420 

Ser Trp Val His Gin lie Ala Leu Arg Met Glu Val Leu Gly Cys Glu 
1425 1430 1435 1440 

Ala Gin Asp Leu Tyr, 
1445 

<210> 3 
<211> 4396 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> Artificial Sequence 

<220> 
<221> CDS 

<222> (19) . . . (4359) 

<223> Modified HUN2AN Factor VIII 

<400> 3 

tagaattcgt aggctagc atg cag ate gag ctg age acc tgc ttc ttc ctg 

Met Gin lie Glu Leu Ser Thr Cys Phe Phe Leu 
15 10 

tgc ctg ctg cgc ttc tgc ttc age gec acc cgc cgc tac tac ctg ggc 
Cys Leu Leu Arg Phe Cys Phe Ser Ala Thr Arg Arg Tyr Tyr Leu Gly 
15 20 25 

gec gtg gag ctg age tgg gac tac atg cag age gac ctg ggc gag ctg 
Ala Val Glu Leu Ser Trp Asp Tyr Met Gin Ser Asp Leu Gly Glu Leu 
30 35 40 

ccc gtg gac gec cgc ttc ccc ccc cgc gtg ccc aag age ttc ccc ttc 
Pro Val Asp Ala Arg Phe Pro Pro Arg Val Pro Lys Ser Phe Pro Phe 
45 50 55 

aac acc age gtg gtg tac aag aag acc ctg ttc gtg gag ttc acc gac 
Asn Thr Ser Val Val Tyr Lys Lys Thr Leu Phe Val Glu Phe Thr Asp 
60 65 70 75 

cac ctg ttc aac ate gec aag ccc cgc ccc ccc tgg atg ggc ctg ctg 
His Leu Phe Asn He Ala Lys Pro Arg Pro Pro Trp Met Gly Leu Leu 
80 85 90 

ggc ccc acc ate cag gee gag gtg tac gac acc gtg gtg ate acc ctg 
Gly Pro Thr He Gin Ala Glu Val Tyr Asp Thr Val Val He Thr Leu 
95 100 105 

aag aac atg gec age cac ccc gtg age ctg cac gee gtg ggc gtg age 
Lys Asn Met Ala Ser His Pro Val Ser Leu His Ala Val Gly Val Ser 
110 H5 120 

tac tgg aag gec age gag ggc gec gag tac gac gac cag acc age cag 
Tyr Trp Lys Al£ Ser Glu Gly Ala Glu Tyr Asp Asp Gin Thr Ser Gin 
125 130 135 

cgc gag aag gag gac gac aag gtg ttc ccc ggc ggc age cac acc tac 
Arg Glu Lys Glu Asp Asp Lys Val Phe Pro Gly Gly Ser His Thr Tyr 
140 145 150 155 


gtg tgg cag gtg ctg aag gag aac ggc ccc atg gcc age gac ccc ctg 531 
Val Trp Gin Val Leu Lys Glu Asn Gly Pro Met Ala Ser Asp Pro Leu 
160 165 170 

tgc ctg acc tac age tac ctg age cac gtg gac ctg gtg aag gac ctg 579 
Cys Leu Thr Tyr Ser Tyr Leu Ser His Val Asp Leu Val Lys Asp Leu 
175 180 185 

aac age ggc ctg ate ggc gcc ctg ctg gtg tgc cgc gag ggc age ctg 627 
Asn Ser Gly Leu lie Gly Ala Leu Leu Val Cys Arg Glu Gly Ser Leu 
190 195 200 

gcc aag gag aag acc cag acc ctg cac aag ttc ate ctg ctg ttc gcc 675 
Ala Lys Glu Lys Thr Gin Thr Leu His Lys Phe He Leu Leu Phe Ala 
205 210 215 

gtg ttc gac gag ggc aag age tgg cac age gag acc aag aac age ctg 723 
Val Phe Asp Glu Gly Lys Ser Trp His Ser Glu Thr Lys Asn Ser Leu 
220 225 230 235 

atg cag gac cgc gac gcc gcc age gcc cgc gcc tgg ccc aag atg cac 771 
Met Gin Asp Arg Asp Ala Ala Ser Ala Arg Ala Trp Pro Lys Met His 
240 245 250 

acc gtg aac ggc tac gtg aac cgc age ctg ccc ggc ctg ate ggc tgc 819 
Thr Val Asn Gly Tyr Val Asn Arg Ser Leu Pro Gly Leu He Gly Cys 
255 260 265 

cac cgc aag age gtg tac tgg cac gtg ate ggc atg ggc acc acc ccc 867 
His Arg Lys Ser Val Tyr Trp His Val He Gly Met Gly Thr Thr Pro 
270 275 280 

gag gtg cac age ate ttc ctg gag ggc cac acc ttc ctg gtg cgc aac 915 
Glu Val His Ser lie Phe Leu Glu Gly His Thr Phe Leu Val Arg Asn 
285 290 295 

cac cgc cag gcc age ctg gag ate age ccc ate acc ttc ctg acc gcc 963 
His Arg Gin Ala Ser Leu Glu He Ser Pro He Thr Phe Leu Thr Ala 
300 305 310 315 

cag acc ctg ctg atg gac ctg ggc cag ttc ctg ctg ttc tgc cac ate 1011 
Gin Thr Leu Leu Met Asp Leu Gly Gin Phe Leu Leu Phe Cys His He 
320 325 330 

age age cac cag cac gac ggc atg gag gcc tac gtg aag gtg gac age 1059 
Ser Ser His Gin His Asp Gly Met Glu Ala Tyr Val Lys Val Asp Ser 
335 340 345 

tgc ccc gag gag ccc cag ctg cgc atg aag aac aac gag gag gcc gag 1107 
Cys Pro Glu Glu Pro Gin Leu Arg Met Lys Asn Asn Glu Glu Ala Glu 
350 355 360 

gac tac gac gac gac ctg acc gac age gag atg gac gtg gtg cgc ttc 1155 
Asp Tyr Asp Asp Asp Leu Thr Asp Ser Glu Met Asp Val Val Arg Phe 
365 370 375 

gac gac gac aac age ccc age ttc ate cag ate cgc age gtg gcc aag 1203 
Asp Asp Asp Asn Ser Pro Ser Phe He Gin He Arg Ser Val Ala Lys 
380 385 390 395 

aag cac ccc aag acc tgg gtg cac tac ate gcc gcc gag gag gag gac 1251 
Lys His Pro Lys Thr Trp Val His Tyr He Ala Ala Glu Glu Glu Asp 
400 405 410 


tgg gac tac gcc ccc ctg gtg ctg gcc ccc gac gac cgc age tac aag 1299 
Trp Asp Tyr Ala Pro Leu Val Leu Ala Pro Asp Asp Arg Ser Tyr Lys 
415 420 425 

age cag tac ctg aac aac ggc ccc cag cgc ate ggc cgc aag tac aag 1347 
Ser Gin Tyr Leu Asn Asn Gly Pro Gin Arg lie Gly Arg Lys Tyr Lys 
430 435 440 

aag gtg cgc ttc atg gcc tac acc gac gag acc ttc aag acc cgc gag 1395 
Lys Val Arg Phe Met Ala Tyr Thr Asp Glu Thr Phe Lys Thr Arg Glu 
445 450 455 

gcc ate cag cac gag age ggc ate ctg ggc ccc ctg ctg tac ggc gag 1443 
Ala lie Gin His Glu Ser Gly lie Leu Gly Pro Leu Leu Tyr Gly Glu 
460 465 470 475 

gtg ggc gac acc ctg ctg ate ate ttc aag aac cag gcc age cgc ccc 1491 
Val Gly Asp Thr Leu Leu He He Phe Lys Asn Gin Ala Ser Arg Pro 
480 485 490 

tac aac ate tac ccc cac ggc ate acc gac gtg cgc ccc ctg tac age 1539 
Tyr Asn He Tyr Pro His Gly He Thr Asp Val Arg Pro Leu Tyr Ser 
495 500 505 

cgc cgc ctg ccc aag ggc gtg aag cac ctg aag gac ttc ccc ate ctg 1587 
Arg Arg Leu Pro Lys Gly Val Lys His Leu Lys Asp Phe Pro He Leu 
510 515 520 

ccc ggc gag ate ttc aag tac aag tgg acc gtg acc gtg gag gac ggc 1635 
Pro Gly Glu He Phe Lys Tyr Lys Trp Thr Val Thr Val Glu Asp Gly 
525 530 535 

ccc acc aag age gac ccc cgc tgc ctg acc cgc tac tac age age ttc 1683 
Pro Thr Lys Ser Asp Pro Arg Cys Leu Thr Arg Tyr Tyr Ser Ser Phe 
540 545 550 555 

gtg aac atg gag cgc gac ctg gcc age ggc ctg ate ggc ccc ctg ctg 1731 
Val Asn Met Glu Arg Asp Leu Ala Ser Gly Leu He Gly Pro Leu Leu 
560 565 570 

ate tgc tac aag gag age gtg gac cag cgc ggc aac cag ate atg age 1779 
He Cys Tyr Lys Glu Ser Val Asp Gin Arg Gly Asn Gin He Met Ser 
575 580 585 

gac aag cgc aac gtg ate ctg ttc age gtg ttc gac gag aac cgc age 1827 
Asp Lys Arg Asn Val He Leu Phe Ser Val Phe Asp Glu Asn Arg Ser 
590 595 600 

tgg tac ctg acc gag aac ate cag cgc ttc ctg ccc aac ccc gcc ggc 1875 
Trp Tyr Leu Thr Glu Asn He Gin Arg Phe Leu Pro Asn Pro Ala Gly 
605 610 615 

gtg cag ctg gag gac ccc gag ttc cag gcc age aac ate atg cac age 1923 
Val Gin Leu Glu Asp Pro Glu Phe Gin Ala Ser Asn He Met His Ser 
620 625 630 635 

ate aac ggc tac gtg ttc gac age ctg cag ctg age gtg tgc ctg cac 1971 
He Asn Gly Tyr Val Phe Asp Ser Leu Gin Leu Ser Val Cys Leu His 
640 645 650 


gag gtg gcc tac tgg tac ate ctg age ate ggc gcc cag acc gac ttc 
Glu Val Ala Tyr Trp Tyr He Leu Ser He Gly Ala Gin Thr Asp Phe 
655 660 665 


2019 


ctg age gtg ttc ttc age ggc tac acc ttc aag cac aag atg gtg tac 2067 
Leu Ser Val Phe Phe Ser Gly Tyr Thr Phe Lys His Lys Met Val Tyr 
670 675 680 

gag gac acc ctg acc ctg ttc ccc ttc age ggc gag acc gtg ttc atg 2115 
Glu Asp Thr Leu Thr Leu Phe Pro Phe Ser Gly Glu Thr Val Phe Met 
685 690 695 

age atg gag aac ccc ggc ctg tgg ate ctg ggc tgc cac aac age gac 2163 
Ser Met Glu Asn Pro Gly Leu Trp lie Leu Gly Cys His Asn Ser Asp 
700 705 710 715 

ttc cgc aac cgc ggc atg acc gec ctg ctg aag gtg age age tgc gac 2211 
Phe Arg Asn Arg Gly Met Thr Ala Leu Leu Lys Val Ser Ser Cys Asp 
720 725 730 

aag aac acc ggc gac tac tac gag gac age tac gag gac ate age gee 2259 
Lys Asn Thr Gly Asp Tyr Tyr Glu Asp Ser Tyr Glu Asp lie Ser Ala 
735 740 745 

tac ctg ctg age aag aac aac gee ate gag ccc cgc agg cgc agg cgc 2307 
Tyr Leu Leu Ser Lys Asn Asn Ala He Glu Pro Arg Arg Arg Arg Arg 
750 755 760 

gag ate acc cgc acc acc ctg cag age gac cag gag gag ate gac tac 2355 
Glu He Thr Arg Thr Thr Leu Gin Ser Asp Gin Glu Glu He Asp Tyr 
765 770 775 

gac gac acc ate age gtg gag atg aag aag gag gac ttc gac ate tac 2403 
Asp Asp Thr He Ser Val Glu Met Lys Lys Glu Asp Phe Asp He Tyr 
780 785 790 795 

gac gag gac gag aac cag age ccc cgc age ttc cag aag aag acc cgc 2451 
Asp Glu Asp Glu Asn Gin Ser Pro Arg Ser Phe Gin Lys Lys Thr Arg 
800 805 810 

cac tac ttc ate gee gee gtg gag cgc ctg tgg gac tac ggc atg age 2499 
His Tyr Phe He Ala Ala Val Glu Arg Leu Trp Asp Tyr Gly Met Ser 
815 820 825 

age age ccc cac gtg ctg cgc aac cgc gee cag age ggc age gtg ccc 2547 
Ser Ser Pro His Val Leu Arg Asn Arg Ala Gin Ser Gly Ser Val Pro 
830 835 840 

cag ttc aag aag gtg gtg ttc cag gag ttc acc gac ggc age ttc acc 2595 
Gin Phe Lys Lys Val Val Phe Gin Glu Phe Thr Asp Gly Ser Phe Thr 
845 850 855 

cag ccc ctg tac cgc ggc gag ctg aac gag cac ctg ggc ctg ctg ggc 2643 
Gin Pro Leu Tyr Arg Gly Glu Leu Asn Glu His Leu Gly Leu Leu Gly 
860 865 870 875 

ccc tac ate cgc gee gag gtg gag gac aac ate atg gtg acc ttc cgc 2691 
Pro Tyr He Arg Ala Glu Val Glu Asp Asn He Met Val Thr Phe Arg 
880 885 890 

aac cag gee age cgc ccc tac age ttc tac age age ctg ate age tac 2739 
Asn Gin Ala Ser Arg Pro Tyr Ser Phe Tyr Ser Ser Leu He Ser Tyr 
895 900 905 

gag gag gac cag cgc cag ggc gec gag ccc cgc aag aac ttc gtg aag 2787 
Glu Glu Asp Gin Arg Gin Gly Ala Glu Pro Arg Lys Asn Phe Val Lys 
910 915 920 


ccc aac gag acc aag acc tac ttc tgg aag gtg cag cac cac atg gcc 
Pro Asn Glu Thr Lys Thr Tyr Phe Trp Lys Val Gin His His Met Ala 
925 930 935 


2835 


ccc acc aag gac gag ttc gac tgc aag gcc tgg gcc tac ttc age gac 
Pro Thr Lys Asp Glu Phe Asp Cys Lys Ala Trp Ala Tyr Phe Ser Asp 
940 945 950 955 


2883 


gtg gac ctg gag aag gac gtg cac age ggc ctg ate ggc ccc ctg ctg 
Val Asp Leu Glu Lys Asp Val His Ser Gly Leu He Gly Pro Leu Leu 
960 965 970 


2931 


gtg tgc cac acc aac acc ctg aac ccc gcc cac ggc cgc cag gtg acc 
Val Cys His Thr Asn Thr Leu Asn Pro Ala His Gly Arg Gin Val Thr 
975 980 985 


2979 


gtg cag gag ttc gcc ctg ttc ttc acc ate ttc gac gag acc aag age 
Val Gin Glu Phe Ala Leu Phe Phe Thr He Phe Asp Glu Thr Lys Ser 
990 995 1000 


3027 


tgg tac ttc acc gag aac atg gag cgc aac tgc cgc gcc ccc tgc aac 
Trp Tyr Phe Thr Glu Asn Met Glu Arg Asn Cys Arg Ala Pro Cys Asn 
1005 1010 1015 


3075 


ate cag atg gag gac ccc acc ttc aag gag aac tac cgc ttc cac gcc 
He Gin Met Glu Asp Pro Thr Phe Lys Glu Asn Tyr Arg Phe His Ala 
1020 1025 1030 1035 


3123 


ate aac ggc tac ate atg gac acc ctg ccc ggc ctg gtg atg gcc cag 
He Asn Gly Tyr He Met Asp Thr Leu Pro Gly Leu Val Met Ala Gin 
1040 1045 1050 


3171 


gac cag cgc ate cgc tgg tac ctg ctg age atg ggc age aac gag aac 
Asp Gin Arg He Arg Trp Tyr Leu Leu Ser Met Gly Ser Asn Glu Asn 
1055 1060 1065 


3219 


ate cac age ate cac ttc age ggc cac gtg ttc acc gtg cgc aag aag 
He His Ser He His Phe Ser Gly His Val Phe Thr Val Arg Lys Lys 
1070 1075 1080 


3267 


gag gag tac aag atg gcc ctg tac aac ctg tac ccc ggc gtg ttc gag 
Glu Glu Tyr Lys Met Ala Leu Tyr Asn Leu Tyr Pro Gly Val Phe Glu 
1085 1090 1095 


3315 


acc gtg gag atg ctg ccc age aag gcc ggc ate tgg cgc gtg gag tgc 
Thr Val Glu Met Leu Pro Ser Lys Ala Gly He Trp Arg Val Glu Cys 
1100 1105 1110 1115 


3363 


ctg ate ggc gag cac ctg cac gcc ggc atg age acc ctg ttc ctg gtg 
Leu lie Gly Glu His Leu His Ala Gly Met Ser Thr Leu Phe Leu Val 
1120 1125 1130 


3411 


tac age aac aag tgc cag acc ccc ctg ggc atg gcc age ggc cac ate 

Tyr Ser Asn Lys Cys Gin Thr Pro Leu Gly Met Ala Ser Gly His He 
1135 1140 1145 

cgc gac ttc cag ate acc gcc age ggc cag tac ggc cag tgg gcc ccc 

Arg Asp Phe Gin lie Thr Ala Ser Gly Gin Tyr Gly Gin Trp Ala Pro 
1150 1155 1160 


3459 


3507 


aag ctg gcc cgc ctg cac tac age ggc age ate aac gcc tgg age acc 
Lys Leu Ala Arg Leu His Tyr Ser Gly Ser He Asn Ala Trp Ser Thr 
1165 H70 1175 


3555 


aag gag ccc ttc age tgg ate aag gtg gac ctg ctg gec ccc atg ate 
Lys Glu Pro Phe Ser Trp lie Lys Val Asp Leu Leu Ala Pro Met lie 
1180 H85 1190 1195 


3603 


ate cac ggc ate aag ace cag ggc gee cgc cag aag ttc age age ctg 
lie His Gly lie Lys Thr Gin Gly Ala Arg Gin Lys Phe Ser Ser Leu 
1200 1205 1210 


3651 


tac ate age cag ttc ate ate atg tac age ctg gac ggc aag aag tgg 
Tyr He Ser Gin Phe He He Met Tyr Ser Leu Asp Gly Lys Lys Trp 
1215 1220 1225 


3699 


cag acc tac cgc ggc aac age ace ggc ace ctg atg gtg ttc ttc ggc 
Gin Thr Tyr Arg Gly Asn Ser Thr Gly Thr Leu Met Val Phe Phe Gly 
1230 1235 1240 


3747 


aac gtg gac age age ggc ate aag cac aac ate ttc aac ccc ccc ate 
Asn Val Asp Ser Ser Gly He Lys His Asn He Phe Asn Pro Pro He 
1245 1250 1255 


3795 


ate gee cgc tac ate cgc ctg cac ccc acc cac tac age ate cgc age 
He Ala Arg Tyr He Arg Leu His Pro Thr His Tyr Ser lie Arg Ser 
1260 1265 1270 . 1275 


3843 


acc ctg cgc atg gag ctg atg ggc tgc gac ctg aac age tgc age atg 
Thr Leu Arg Met Glu Leu Met Gly Cys Asp Leu Asn Ser Cys Ser Met 
1280 1285 1290 


3891 


ccc ctg ggc atg gag age aag gee ate age gac gee cag ate acc gee 
Pro Leu Gly Met Glu Ser Lys Ala He Ser Asp Ala Gin He Thr Ala 
1295 1300 1305 


3939 


age age tac ttc acc aac atg ttc gee acc tgg age ccc age aag gee 
Ser Ser Tyr Phe Thr Asn Met Phe Ala Thr Trp Ser Pro Ser Lys Ala 
1310 1315 1320 


3987 


cgc ctg cac ctg cag ggc cgc age aac gee tgg cgc ccc cag gtg aac 
Arg Leu His Leu Gin Gly Arg Ser Asn Ala Trp Arg Pro Gin Val Asn 
1325 1330 1335 


4035 


aac ccc aag gag tgg ctg cag gtg gac ttc cag aag acc atg aag gtg 
Asn Pro Lys Glu Trp Leu Gin Val Asp Phe Gin Lys Thr Met Lys Val 
1340 1345 1350 1355 


4083 


acc ggc gtg acc acc cag ggc gtg aag age ctg ctg acc age atg tac 
Thr Gly Val Thr Thr Gin Gly Val Lys Ser Leu Leu Thr Ser Met Tyr 
1360 1365 1370 


4131 


gtg aag gag ttc ctg ate age age age cag gac ggc cac cag tgg acc 
Val Lys Glu Phe Leu He Ser Ser Ser Gin Asp Gly His Gin Trp Thr 
1375 1380 1385 

ctg ttc ttc cag aac ggc aag gtg aag gtg ttc cag ggc aac cag gac 
Leu Phe Phe Gin Asn Gly Lys Val Lys Val Phe Gin Gly Asn Gin Asp 
1390 1395 1400 


4179 


4227 


age ttc acc ccc gtg gtg aac age ctg gac ccc ccc ctg ctg acc cgc 
Ser Phe Thr Pro Val Val Asn Ser Leu Asp Pro Pro Leu Leu Thr Arg 
1405 1410 1415 


4275 


tac ctg cgc ate cac ccc cag age tgg gtg cac cag ate gee ctg cgc 
Tyr Leu Arg He His Pro Gin Ser Trp Val His Gin He Ala Leu Arg 
1420 1425 1430 1435 


4323 


atg gag gtg ctg ggc tgc gag gcc cag gac ctg tac tagctgcccg 4381 
Met Glu Val Leu Gly Cys Glu Ala Gin Asp Leu Tyr 
1440 1445 

ggctacaagc tttac 4396 
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25 



30 

Trp 

nSp 

Tyr 

Met 

Gin 

Ser 

Asp 

Leu 

Glv 

Glu 

Leu 

Pro Val Asp Ala Arcr 

75 
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40 




45 

Phe 

Pro 

Pro 

Arg 

Val 

Pro 

Lys 

Ser 

Phe 

Pro 

Phe 

Asn Thr Ser Val Val 






55 





60 

xyr 


Lys 

Thr 

Leu 

Phe 

Val 

Glu 

Phe 

Thr 

Asp 

His Leu Phe Asn He 

A5 



70 





75 

80 

Ala 

Lys 

Pro 

Arcr 

Pro 

Pro 

Trp 

Met 

Gly 

Leu 

Leu 

Gly Pro Thr He Gin 




85 





90 


95 

Ala 

Glu 

Val 

Tyr 

Asp 

Thr 

Val 

Val 

He 

Thr 

Leu 

Lys Asn Met Ala Ser 




100 




105 



110 

His 

Pro 

Val 
115 

Ser 

Leu 

His 

Ala 

Val 
120 

Gly 

Val 

Ser 

Tyr Trp Lys Ala Ser 
125 

Glu 

Gly 
130 

Ala 

Glu 

Tyr 
* 

Asp 

Asp 
135 

Gin 

Thr 

Ser 

Gin 

Arg Glu Lys Glu Asp 
140 

Asp 

Lys 

Val 

Phe 

Pro 

Gly 

Gly 

Ser 

His 

Thr 

Tyr 

Val Trp Gin Val Leu 

145 




150 





155 

160 

Lys 

Glu 

Asn 

Gly 

Pro 

Met 

Ala 

Ser 

Asp 

Pro 

Leu 

Cys Leu Thr Tyr Ser 



165 





170 


175 

Tyr 

Leu 

Ser 

His 

Val 

Asp 

Leu 

Val 

Lys 

Asp 

Leu 

Asn Ser Gly Leu He 



180 





185 



190 

Gly 

Ala 

Leu 

Leu 

Val 

Cys 

Arg 

Glu 

Gly 

Ser 

Leu 

Ala Lys Glu Lys Thr 


195 





200 




205 

Gin 

Thr 

Leu 

His 

Lys 

Phe 

He 

Leu 

Leu 

Phe 

Ala 

Val Phe Asp Glu Gly 


210 




215 





220 

Lys 

Ser 

Trp 

His 

Ser 

Glu 

Thr 

Lys 

Asn 

Ser 

Leu 

Met Gin Asp Arg Asp 

225 




230 





235 

240 

Ala 

Ala 

Ser 

Ala 

Arg 
245 

Ala 

Trp 

Pro 

Lys 

Met 
250 

His 

Thr Val Asn Gly Tyr 
255 

Val 

Asn 

Arg 

Ser 

Leu 

Pro 

Gly 

Leu 

He 

Gly 

Cys 

His Arg Lys Ser Val 



260 





265 



270 

Tyr 

Trp 

His 

Val 

He 

Gly 

Met 

Gly 

Thr 

Thr 

Pro 

Glu Val His Ser He 

275 





280 




285 

Phe 

Leu 

Glu 

Gly 

His 

Thr 

Phe 

Leu 

Val 

Arg 

Asn 

His Arg Gin Ala Ser 


290 




295 





300 

Leu 

Glu 

He 

Ser 

Pro 

He 

Thr 

Phe 

Leu 

Thr 

Ala 

Gin Thr Leu Leu Met 

305 





310 





315 

320 

Asp 

Leu 

Gly 

Gin 

Phe 

Leu 

Leu 

Phe 

Cys 

His 

He 

Ser Ser His Gin His 



325 





330 


335 

Asp 

Gly 

Met 

Glu 

Ala 

Tyr 

Val 

Lys 

Val 

Asp 

Ser 

Cys Pro Glu Glu Pro 


340 





345 



350 

Gin 

Leu 

Arg 
355 

Met 

Lys 

Asn 

Asn 

Glu 
360 

Glu 

Ala 

Glu 

Asp Tyr Asp Asp Asp 
365 


Leu Thr Asp Ser Olu K.t Asp V.l Val Ar, Ph. MP A.p MP A.n «r 

pro ler Ph. I" «» "? "5 S - ™ "» ty ' Hi ' P " "** ISS 
val Hi. Tyr II. "a Al. Olu Olu Olu A.p Trp Asp Tyr Ala Pro 
Leu val Leu Ala III Asp Asp Arg Ser Tyr Lys Ser Gin Tyr Leu Asn 
„n Oly Pro ill Arg.il. Gly Arg Lys Tyr Lys Lys Val Arc, Phe Met 
Al. Tyr t" Asp Glu Thr Ph. Ly. Thr Arg olu Ala II. oln Bi. Olu 
S.r o!y II. L.u Oly Pro «u L.U Tyr Oly Olu v.l Oly Asp Thr Leu 
il, II. Ph. Lys Asn Oln Al. s.r Arg Pro Tyr Asn II. Tyr Pro 
His Oly lie Thr Asp V.1 Arg Pro Leu Tyr N> Ar, Ar, Leu Pro Lys 
Gly V,l Ly. H°i°. Leu Ly. Asp Ph. Pro II. Leu Pro Oly olu II. Ph. 
Lys Tyr Lys Trp Thr val Thr Val olu Asp Gly Pro Thr Lys s.r Asp 
Pro Arg Oy. » Thr Arg Tyr Tyr ser ser Phe val Asn Met olu Arg 
A.p L.U Al. ser Oly leu II. oly Pro L.U Leu II. Cy. Tyr Ly. Glu 
S.r V.1 Asp Gin Ar| oly Asn oln II. Met Ser A.p Ly. Ar, A.n V.1 
H. L.U Ph. Ill V.1 Ph. Asp Olu A.n Ar, S.r Trp Tyr Leu Thr olu 
Asn He "n Ar, Phe Leu Pro Asn Pro Ala Gly val oln Leu Glu Asp 
Pro "lu Ph. Gin Ala ser Asn He Het His ser II. Asn Oly Tyr Val 
In. Asp ser Leu Gin Leu Ser Val Cys Leu His olu Val Ala Tyr Trp 
Tyr II. Leu S.r He Oly Ala Gin Thr Asp Ph. Leu s.r v.l Phe Ph. 
s.r Oi, Tyr Tnr Ph. Ly. His Ly; M.t V.l Tyr Glu Asp Thr L.u Thr 
L. u Phe III Ph. S.r Gly Olu Thr Val Ph. H.t ser Het Glu Asn Pro 
Gly Su Trp U. L.u Gly Cy. Hi. A.n Ser Asp Ph. Ar, Asn Ar, Gly 
„°et Thr Al. Leu Leu ly. Val s.r ser Cy. Asp Ly, Asn Thr oly Asp 
Tyr Tyr olu Asp sir Tyr Glu Asp lie ser Ala Tyr Leu Leu ser Lys 
Asn A.n Ala Tie Glu Pro Arg Ar, Ar, Arg Ar, Glu lie Thr Arg Thr 
Thr Leu o" ser Asp Gin olu olu He Asp Tyr Asp Asp Thr lie S.r 
V.1 III H.t Ly. Lys Olu Asp Ph. Asp 11. Tyr A.p Glu Asp Glu Asn 
o!„ s.r Pro Arg s.r III Oln Ly. Ly. Thr Ar, His Tyr Ph. II. Ala 
Ala v.! Glu Arg III Trp Asp Tyr Gly Het s.r ser ser Pro Hi. Val 
Leu Ar, A,n g Ala Gin Ser Gly Ser Val Pro pin Ph. Ly. Ly. V.1 
V,l Ph. "n Olu Ph. Thr Asp Gly S.r Ph. Thr Gin Pro Leu Tyr Ar, 
Oly III Leu Asn Glu His Leu oly Leu Leu oly Pro Tyr He Ar, Ala 
8S5 870 


Glu Val Glu Asp Asn lie Met Val Thr Phe Arg Asn Gin Ala Ser Arg 

885 890 895 

Pro Tyr Ser Phe Tyr Ser Ser Leu lie Ser Tyr Glu Glu Asp Gin Arg 

900 905 910 

Gin Gly Ala Glu Pro Arg Lys Asn Phe Val Lys Pro Asn Glu Thr Lys 

915 920 925 

Thr Tyr Phe Trp Lys Val Gin His His Met Ala Pro Thr Lys Asp Glu 

930 935 940 

Phe Asp Cys Lys Ala Trp Ala Tyr Phe Ser Asp Val Asp Leu Glu Lys 
945 950 955 960 

Asp Val His Ser Gly Leu lie Gly Pro Leu- Leu Val Cys His Thr Asn 

965 970 975 

Thr Leu Asn Pro Ala His Gly Arg Gin Val Thr Val Gin Glu Phe Ala 

980 985 990 

Leu Phe Phe Thr He Phe Asp Glu Thr Lys Ser Trp Tyr Phe Thr Glu 

995 1000 1005 

Asn Met Glu Arg Asn Cys Arg Ala Pro Cys Asn He Gin Met Glu Asp 

1010 1015 1020 

Pro Thr Phe Lys Glu Asn Tyr Arg Phe His Ala He Asn Gly Tyr He 
1025 1030 1035 1040 

Met Asp Thr Leu Pro Gly Leu Val Met Ala Gin Asp Gin Arg He Arg 

1045 1050 1055 

Trp Tyr Leu Leu Ser Met Gly Ser Asn Glu Asn He His Ser He His 

1060 1065 1070 . 

Phe Ser Gly His Val Phe Thr Val Arg Lys Lys Glu Glu Tyr Lys Met 

1075 1080 1085 

Ala Leu Tyr Asn Leu Tyr Pro Gly Val Phe Glu Thr Val Glu Met Leu 

1090 1095 1100 

Pro Ser Lys Ala Gly He Trp Arg Val Glu Cys Leu He Gly Glu His 
1105 1110 1115 1120 

Leu His Ala Gly Met Ser Thr Leu Phe Leu Val Tyr Ser Asn Lys Cys 

1125 1130 1135 

Gin Thr Pro Leu Gly Met Ala Ser Gly His He Arg Asp Phe Gin He 

1140 1145 1150 

Thr Ala Ser Gly Gin Tyr Gly Gin Trp Ala Pro Lys Leu Ala Arg Leu 

1155 1160 1165 

His Tyr Ser Gly Ser He Asn Ala Trp Ser Thr Lys Glu Pro Phe Ser 

1170 1175 1180 

Trp He Lys Val Asp Leu Leu Ala Pro Met He He His Gly He Lys 
1185 1190 1195 1200 

Thr Gin Gly Ala Arg Gin Lys Phe Ser Ser Leu Tyr He Ser Gin Phe 

1205 1210 1215 

He He Met Tyr Ser Leu Asp Gly Lys Lys Trp Gin Thr Tyr Arg Gly 

1220 1225 1230 

Asn Ser Thr Gly Thr Leu Met Val Phe Phe Gly Asn Val Asp Ser Ser 

1235 1240 1245 

Gly He Lys His Asn He Phe Asn Pro Pro He He Ala Arg Tyr He 

1250 1255 1260 

Arg Leu His Pro Thr His Tyr Ser He Arg Ser Thr Leu Arg Met Glu 
1265 1270 1275 1280 

Leu Met Gly Cys Asp Leu Asn Ser Cys Ser Met Pro Leu Gly Met Glu 

1285 1290 1295 

Ser Lys Ala He Ser Asp Ala Gin He Thr Ala Ser Ser Tyr Phe Thr 

1300 1305 1310 

Asn Met Phe Ala Thr Trp Ser Pro Ser Lys Ala Arg Leu His Leu Gin 

1315 1320 1325 

Gly Arg Ser Asn Ala Trp Arg Pro Gin Val Asn Asn Pro Lys Glu Trp 

1330 1335 1340 

Leu Gin Val Asp Phe Gin Lys Thr Met Lys Val Thr Gly Val Thr Thr 
1345 1350 1355 1360 

Gin Gly Val Lys Ser Leu Leu Thr Ser Met Tyr Val Lys Glu Phe Leu 

1365 1370 1375 

He Ser Ser Ser Gin Asp Gly His Gin Trp Thr Leu Phe Phe Gin Asn 
1380 1385 1390 


Glv Lva Val Lya Val Phe Gin Gly Asn Gin Aap Ser Phe Thr Pro Val 

1395 1400 1405 

Val Aan Ser Leu Asp Pro Pro Leu Leu Thr Arg Tyr Leu Arg lie His 

1410 1415 1420 

Pro Gin ser Trp Val His Gin lie Ala Leu Arg Met Glu Val Leu Gly 

1425 1430 1435 1440 

Cys Glu Ala Gin Asp Leu Tyr 
1445 


Leu Asn Glu His Leu Gly Leu Leu Gly Pro Tyr lie Arg Ala Glu Val 
865 870 875 880 

Glu Asp Asn lie Met Val Thr Phe Arg Asn Gin Ala Ser Arg Pro Tyr 

885 890 895 

Ser Phe Tyr Ser Ser Leu He Ser Tyr Glu Glu Asp Gin Arg Gin Gly 

900 905 910 

Ala Glu Pro Arg Lys Asn Phe Val Lys Pro Asn Glu Thr Lys Thr Tyr 

915 920 925 

Phe Trp Lys Val Gin His His Met Ala Pro Thr Lys Asp Glu Phe Asp 

930 935 940 

Cys Lys Ala Trp Ala Tyr Phe Ser Asp Val Asp Leu Glu Lys Asp Val 
945 950 955 960 

His Ser Gly Leu He Gly Pro Leu Leu Val Cys His Thr Asn Thr Leu 

965 970 975 

Asn Pro Ala His Gly Arg Gin Val Thr Val Gin Glu Phe Ala Leu Phe 

980 985 990 

Phe Thr lie Phe Asp Glu Thr Lys Ser Trp Tyr Phe Thr Glu Asn Met 

995 1000 1005 

Glu Arg Asn Cys Arg Ala Pro Cys Asn He Gin Met Glu Asp Pro Thr 

1010 1015 1020 

Phe Lys Glu Asn Tyr Arg Phe His Ala He Asn Gly Tyr He Met Asp 
1025 1030 1035 1040 

Thr Leu Pro Gly Leu Val Met Ala Gin Asp Gin Arg He Arg Trp Tyr 

1045 1050 1055 

Leu Leu Ser Met Gly Ser Asn Glu Asn He His Ser He His Phe Ser 

1060 1065 1070 

Gly His Val Phe Thr Val Arg Lys Lys Glu Glu Tyr Lys Met Ala Leu 

1075 1080 1085 

Tyr Asn Leu Tyr Pro Gly Val Phe Glu Thr Val Glu Met Leu Pro Ser 

1090 1095 1100 

Lys Ala Gly He Trp Arg Val Glu Cys Leu He Gly Glu His Leu His 
1105 1H0 1115 1120 

Ala Gly Met Ser Thr Leu Phe Leu Val Tyr Ser Asn Lys Cys Gin Thr 

1125 1130 1135 

Pro Leu Gly Met Ala Ser Gly His He Arg Asp Phe Gin He Thr Ala 

1140 1145 1150 

Ser Gly Gin Tyr Gly Gin Trp Ala Pro Lys Leu Ala Arg Leu His Tyr 

1155 H60 1165 

Ser Gly Ser He Asn Ala Trp Ser Thr Lys Glu Pro Phe Ser Trp He 

1170 1175 1180 

Lys Val Asp Leu Leu Ala Pro Met He He His Gly He Lys Thr Gin 
1185 1190 1195 1200 

Gly Ala Arg Gin Lys Phe Ser Ser Leu Tyr He Ser Gin Phe He He 

1205 1210 1215 

Met Tyr Ser Leu Asp Gly Lys Lys Trp Gin Thr Tyr Arg Gly Asn Ser 

1220 1225 1230 

Thr Gly Thr Leu Met Val Phe Phe Gly Asn Val Asp Ser Ser Gly He 

1235 1240 1245 

Lys His Asn He Phe Asn Pro Pro He He Ala Arg Tyr He Arg Leu 

1250 1255 1260 

His Pro Thr His Tyr Ser He Arg Ser Thr Leu Arg Met Glu Leu Met 
1265 1270 1275 1280 

Glv Cvs Asp Leu Asn Ser Cys Ser Met Pro Leu Gly Met Glu Ser Lys 

Y Y 1285 1290 1295 

Ala He Ser Asp Ala Gin He Thr Ala Ser Ser Tyr Phe Thr Asn Met 

1300 1305 1310 

Phe Ala Thr Trp Ser Pro Ser Lys Ala Arg Leu His Leu Gin Gly Arg 

1315 1320 1325 

Ser Asn Ala Trp Arg Pro Gin Val Asn Asn Pro Lys Glu Trp Leu Gin 

1330 1335 1340 

Val Asp Phe Gin Lys Thr Met Lys Val Thr Gly Val Thr Thr Gin Gly 
1345 1350 1355 1360 

Val Lvs Ser Leu Leu Thr Ser Met Tyr Val Lys Glu Phe Leu He Ser 
1 1365 1370 1375 


